THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED RY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Parabolic Test for Linkage. N. L. JoHNson 

Reduction of a Certain Class of Composite Statistical Hypotheses. 
Grorce W. Brown 

The Selection of Variates for Use in Prediction with some Com- 
ments on the Problem of Nuisance Parameters. HaARoLp 
HoTeE.LLING 

The Fitting of Straight Lines if Both Variables are Subject to 
Error. ABRAHAM WALD 

.A Method for Minimizing the Sum of Absolute Values of Devia- 
tions. Rosert R. SINGLETON 

A Study of a Universe of n Finite Populations with Application to 
Moment-Function Adjustments for Grouped Data. JosrPrx 


The Analysis of Variance when Experimental Errors follow the 
Poisson or Binomial Laws. W.G. CocHran 


Notes: 


Orthogonal Polynomials Applied to Least Square Fitting of Weighted ; 
Observations. Braprorp F. KimBaui 348 

Combinatorial Formulas for the r* Standard Moment of the Sample 
Sum, of the Sample Mean, and of the Normal Curve. P.8. Dwyer. 

On a Method of Sampling. 

Rank Correlatior when there are Equal Variates. Max A. Woopsoury. 

Note on Theoretical and Observed Distributions of Repetitive Occur- 
rences. P. S. OtmsTeapD 


Vol. XI, No. 3 — September, 1940 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 


. S. S. WILKS, Editor 
A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHer R. DE MIsEs 

H. Cramtr T. C. Fry E. S. PEARSON, 
W. E. Demine H. Hore..ine H. L. Rrerz 

G. Darmois W. A. SHEWHART 


The ANNALS OF MaTHemMaTicAL Sratistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt. 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, P. R. Rider, Washington University, St. 
Louis, Mo. 


Manuscripts for publication in the ANNALS OF MaTHematTicaL STaTIsTics 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possiblé replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $4.00 per year. Single copies $1.25. 
Back numbers are available at the following rates: 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 








a a 





PARABOLIC TEST FOR LINKAGE 
By N. L. JoHNsON 


1. Introduction. In this paper a problem in testing statistical hypotheses 
- which has applications in genetics will be treated from the standpoint of the 
Neyman-Pearson approach. This approach has been developed in a series of 
papers, [4], [5], [6], [7], [8], [9], [10], to which the reader is referred for definitions 
of the concepts of a simple statistical hypothesis, critical regions, power function 
of a test with respect to alternative hypotheses, and that of a test unbiased in 
the limit employed in the present paper. 


2. Statement of Problem. We shall consider M independent experiments, 
which will each yield results falling into one of the four categories described by 
the possible combinations of the 4 events a, not-a (or 4), b, and not-b (or 6) 
as set up in the following table. 





P2| 1— Pe 


We shall assume that the marginal probabilities are known and have values 
P,, 1 — Pi, Pe, 1 — Pe as shown in the table. Thus P; = probability of 
event b happening whether event a occurs or not. It is obvious that if, further, 
the probability of a result falling in any one category or cell is fixed, then the 
other three cell probabilities will also be fixed. For if pi, po, ps, ps be the 
four cell probabilities as shown in the table above, we must have 


(1) Pi + po = Py; Pi + Ds = Pe; Po +m =1-— Pe. 


Hence the values of the cell probabilities will be determined by a single parameter 
6, say, as follows 


pi = P,P" pr = P,(1 — Pre’) 
ps = Pl — Pre’) pm =1— Py — Po + P,P’. 


(2) 


The range of values which @ may take for the set of admissible hypotheses is 
found from the conditions 
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(3) Osp<s!1 (¢ = 1, 2,3,4) 
to be 

(4) —«o <6< min (—log Pi, —log P.) if Pi + Po <1 

but 


(5) log (Pi' + P2’ — Pi'P2') < @ < min (—log P; , —log P2) if Py + P, >1, 


The hypothesis tested, Ho, is that 6 = 0, i.e. that the events a and b are 
independent. It will be noticed that Hp is a simple hypothesis, since it specifies 
the probability law of the observed variables completely. In fact, if m; be 
the number of results out of our M experiments which are in the 7th category, 
then mm, m2, m3, m, are our observed variables, and we have 


’ ’ ’ ’ M! por pir: pos poe 
(6) P{m, = mi, m = m2, m3 = M3, mM = mM, | Ho} ae 7 ? 7 
m,! ms! ms! m,! 





where po; is the value of p; when 6 = 0. 

This is the conceptual model used in testing for linkage in two pairs of genes; 
Hy corresponds to the hypothesis ‘‘there is no linkage.” Fuller explanations 
are given by Fisher [3]. It should be noted, however, that Fisher uses a pa- 
rameter 6 corresponding to 4e’ in this paper. 


3. Basis of Selection of Test. The question now arises; what test shall we 
choose for the hypothesis Hy)? That is, what should the critical region w be 
to give us results as satisfactory as possible? The main aim must be to avoid 
errors, both of first and second kind, as far as possible. The first kind of error 
is subject to control, since the probability of the sample point £ falling in w 
when Hp is true (which we shall denote by P{E ew | Ho}) can be determined 
approximately, Hy being simple. The critical region w is therefore chosen, if 
possible, to give a definite level of significance to the test associated with it. 
However, there will usually be many regions which will do this, and in 
order to decide which of them give more satisfactory results we consider 
(1 — P{E ew| H}); i.e. the probability of the second kind of error with respect 
to an alternative hypothesis H, the first kind of error being fixed. 

In the present case H will be determined by @ and so we may put 
P{Eew|H} = B(w| 6), where B(w | 6), considered as a function of 6, will be 
the power function of the test associated with the critical region w. We want 
w to be such that B(w|0) = a. a being the fixed level of significance while 
B(w | 6) is as large as possible. 

It is also desirable that we should accept the hypothesis Hy more often when 
it is true than when any one of the alternative hypotheses (H) is true. Ex- 
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pressed symbolically, this means that 
(7) B(w | 0) < B(w| 6) forall 6 #0. 
Any test satisfying the last condition is said to be unbiased. 
If 6 and - are each continuous and differentiable functions of 6, and we 


consider only those alternative hypotheses specified by suitably small values 
of 6, sufficient conditions for the test to be unbiased will be 


0B a 
a’B 
(9) as > 0. 


According to the terminology recently adopted by Daly [1], the tests of 
which it is known only that they satisfy (8) and (9), are called locally unbiased. 
If a region w could be found such that, v being any other region for which 


(10) B(w | 0) = B(v|0), then B(w| 6) > Br| 6) 


for all 6 ¥ 0, this would give a test which would be the best with respect to any 
alternative hypothesis. However, it has been shown by Neyman [4] that under 
certain conditions, which many probability laws satisfy, such a test will not 
exist. An attempt is therefore made to control the power of the test with 
respect to hypotheses specifying values of 6 near to 0; hoping that the powers 
of the tests so obtained with respect to the other hypotheses will behave in a 
satisfactory manner. Thus Neyman and Pearson [9] define an “unbiased test 
of Type A” as a test corresponding to a critical region w such that if v be any 
other region in the sample space W for which 





(11) B(w | 0) = B(Wv|0) =a 
and 
we anol] = 20010) g 
00 e—=0 00 =) 
then 
2 
(13) oie \@ | > *aeie} 
6? =O 6? 0 
In the problem which I am treating the conditions 
(14) B(w|0) =a; seiei0| at 
00 e—0 


implied by (11) and (12) above cannot, in general, be satisfied, since the distribu- 
tion is discontinuous, i.e. P{E ¢ w | Ho} is a discontinuous function of w and, in 
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fact, for a given sample size, has only a finite number of possible values, none 
of which need be equal to a. 

However, it may be possible to find a test of Ho of a type called ‘“unbiaseg 
in the limit (as M increases),’’ based on the limiting form of the multinomial 
distribution which is a continuous function of w. The definition [6] of a tes 
“unbiased in the limit” will be taken as follows: 

Suppose we have a sequence (wm) of critical regions, wy corresponding to a 
sample of size M, such that 

(2) for any M, if vy be any region for which 


(15) B(wau | 0) = B(vm | 0) 


and 


(16) sein] | 
00 G==0 00 e—0 


then 


(17) ~— 5 FB(vm | t 
06? — 062 e—=0 
(iz) 


(18) lim B(ws|0) = a, 
(227) af 
(19) 3 = /M (6 — 0) = Me 
° aB(wm | 3) ae 
(20) = ae 1. oe 


then the test associated with this sequence of critical regions is unbiased in the 
limit. I shall call such a test a test of type A,,. 
The reason for using # as the variable in condition (19) above is that, unless 


our sequence of critical regions has been very badly or unluckily chosen, we 
shall have 


(21) lim B(wa | 6) = 1 (0 # 0) 


will not 


8B(w w | 8) 
30 


while, by (18), lim B(w » | 0) = a and so, in general, lim 
M->2 Mc 


exist at @ = 0. Hence we introduce #3, termed the normalized error, and, keeping 
08(w x | 8) 

ae 
In the next section will be obtained a test of Ho which is of type A,. 


3 constant (and hence making @ tend to zero) we form lim 
M0 


4. Derivation of Test. The composition of a sample of M experiments 1 
uniquely determined by the numbers of results m,, mz, ms falling in the Ist, 
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nd and 3rd categories respectively. Thus any sample may be represented by a 
point Z(m) in a three-dimensional sample space W(m) with coordinate axes of 
m, m2, and mz. It will occasionally be convenient to represent the sample 
by a point in a three-dimensional space with other axes. The following sample 
spaces will be used. 


W (m)—space with coordinate axes of m , mz, m3 

W(d)— “ “© di, de, ds 

W(z)— “ “ *m@, %, & 

W(n)— “ - ih. i oe 
where 
(22) = (1 = 1, 2, 3, 4) 
(23) xs = (m; — Mpo:)/(Mpos)’ (i = 1, 2, 3, 4) 
(24) n; = m;/M (t = 1, 2, 3, 4). 


I shall use wy indifferently to denote “the critical region corresponding to 
sample size M”’ in any of the four sample spaces above; E indifferently to 
denote corresponding positions of the sample point in any of the four sample 
spaces: except in cases where confusion might arise, where I shall use wy(m), 
wu(d), wau(x), wu(n) and E(m), E(d), E(x), E(n). When necessary the size of 
sample with which a point E is associated will be denoted by a subscript; e.g. E x» . 

In finding a test of type A, we shall need to consider the quantities 

2 
B(w wu | 0), el) , and seen lo) , where 3 = 04/M. 
av d—0 a6? —0 

The probability law of the observed values m , mz, ms; is discontinuous with 
respect to the points of the sample space Wy. For if E° be a point which 
corresponds to integral values mj , m:, m3 of m,, m2, ms; ; subject to the re- 
strictions 


(25) O<m; (i = 1, 2, 3) 
(26) 


then 


(27) P{Ey = E\@ = 0} = M1! pai! pos? pos* pax! 
are m*! m8! m3! m%! 


where 


3 


(28) Lm = 


Vite Rvoit: 
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and 
Pu = P,P» Por = Pi(1 — Pe) 


Pos = Pol — Pi) pu = (1 — Pi)(1 — Pr) 
while if EZ” be not such a point 


(29) 


(30) P{Ey = E’| 0} =0 
whatever the value of @ may be. Now 
, 1 y™) gi? 93 4 
(31) imis ~ TLCS K 
wM m,! me! ms! m,! 


where 7; , D2, D3, Ps are as defined in (2) above, and 2 denotes a finite sum- 


mation over all points E’ in wy for which P{Ey = E’ | 6} ~ 0. Differentiating 
each side of (31) with respect to 6, we get 


= | a et ! M! por pi Pox Pos Pos 
2X “my! me! ms! my! 





(32) 
x | mt — P, — Pr») Sere t seh) 
(1 — P)(1 — Pe) 
and 
a * B(was | reel) => > Mu! M! por pox pos P pox Pos Pos 
06? wM my! me! ms! my! 


1 2 
(33) ‘T= Pp — Pye ml — P, — Ps) — mP2 — msP; + MP,P?} 


— {m P,P.(1 — P, — P2) + mP2(1 — Pi — PiPs) 

+ msP\(1 — P2 — Pi: P2) — MP,P.(1 — P,)(1 — P2)}). 
THEOREM 1. The sequence of critical regions (wy) defined by 
(34) 
where 
(35) u = a,(P1P2)(1 — P: — Ps) — 22Pi(1 — Ps)! P2 — 2sPX(1 — Pi)'Pi 

{P,P2(1 — P;)(1 — Pe)}! 


Pi(1 — P,)(2P2 — 1) {a(P; P2)* + a3 PA(1 _ P,)') 
a + P.(1 — P2)(2Pi — 1){x:(P: Ps)! + a PI * — P,)} 









v+ Bw > Ainwy; 





v + Bu’ < A elsewhere, 











MP,P,(1 — P,)(1 — P2) ' 
(37) B= t Pi =P) — 2PD* + Pl — P20 = Fp 
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1 +0 tut f jo ms 
(38) on " \e a. dv) du = a@ 


ee M Pos 
and ~~ = (Mpo)* 
H,(@ = 0) which is unbiased in the limit, of type A,, at level of significance a, 
provided that : 


(39) 0<P;<1 (i = 1, 2) 


and P; and P2 are not both equal to 3. 
In Lemma 1 of the Appendix (paragraph 9), put s = 2, and let 


as defined above, is associated with a test of the hypothesis 


f: = individual members of the summation for B(wx | 0) (see (31)) 


—— (see (32)) 


0°B(wm | )) 


362 (see (33)). 


From LEMMA 1 we see that the regions (w) defined by 
fs > Gifi + aefe in w 
fs < afi + defo elsewhere 
will maximize x fo with respect to all regions for which x fi and x fe are fixed. 


(40) 


(a, and a2 are arbitrary constants depending on the fixed values of ye fi and 


> fe). Hence any sequence of critical regions (wy) defined by 


{m(1 — Pi — Pe) — mP2 — msP, + MP,;P;}? 
— {mP,Px(1 — P; — Ps) + mP2(1 — P; — P,P) 
+ m3Pi(1 — P2 — PiP2) — MP,P2(1 — P,)(1 — Ps2)} 
> a,{m (1 — P, — Pe) — mP2 — msP; + MP,P2} + at 


in wy, will satisfy conditions (7) given above in the definition of a test of 
type A... The inequality (41) may be rewritten 


{m(1 — P; — Ps) — meP2 — msP,; + MP,P2 — as}° 
(42) — [P2(1 — Pi){m. — MP,(1 — P2)} 
+ P,(1 — P2){m; — MP,2(1 — Pi)}] > a 


(41) 


the a,’s being arbitrary constants. 

Also, by THEOREM 1 of the Appendix, we have that, for any given e > 0 
and any region w, there is a number M, independent of w and such that for all 
M>M.,, 
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(43) | B(w | 0) — I(w)| <.« 
where : 
sa a —$x2 
(44) I(w) = (Ox)! ia J J J e °*0 dx, dx2 dx; 
and ° 
3 
(45) xo = a ti(1 + poipor) + 2 2, 2; 2;(PoiDo;)* Pos - 





We will now apply a transformation to the coordinates m , mz , m3 which will 

(a) transform inequality (42) into a simpler form, 

(b) transform I(w) into a form to which the tables of the Normal Probability 
Integral may easily be applied for purposes of calculation. 

This transformation is 


(46) u = xi(P; P2)*(1 — P, — P;) - a P41 ‘a P2)' P, = 23 P3(1 x P,)'P, 
{P,P2(1 — P:)(1 — P2)}! 
P\(1 — P,)(2P2 — 1) {x(Pi P2) + x3 PA1 -r P,)'} 
(47) v= + P,(1 — P2)(2P; — 1){x(P:P2)' + ae P41 -~ P,)'} 
[P; P2(1 — Pi)(1 — P2){Pi(1 — Pi)(1 — 2P2)? + P2(1 — P2)(1 — 2P;)*}}! 
(2P, — 1){2(P:P2)' + 23 P3(1 — P)*} 
) t= = (2P2 — 1){x(PiP2)' + mPi(l — Pr)'} 
{Pi(1 — P1)(1 — 2P2)? + Po(1 — Pe)(1 — 2P,)?}3 
This is a proper transformation, since under the conditions of the theorem 
0 < P; < 1 and P, and P: are not both 4; and the Jacobian 

















a(u, »v, t) all 
49 J= 7 = 
( ) a(x » 7%, 2s) on 












is non-zero and of constant sign. 
Also 


(50) 
Hence . 

1 ur+v2+¢2 
(51) Iw) = oi / / / "i ’ du dv dt. 


w(u,v,t) 







x=utyrt &. 








The inequality (42) is transformed into an inequality of form B(u — as)’ +> A 
where B has the value stated above; as; and A being at present arbitrary 
constants. 

Therefore we may put a; = 0 and define A by the equation 


too oo 
(52) 2 eo" | ¢ av} du=a 
2r — 00 A~—Bu?2 
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and conclude that the sequence of critical regions (wy) defined by the in- 
equalities 

Bw +v>A inwy 
(53) >, 2 
Bu’ +v <A_ elsewhere 


will satisfy conditions (7) for a test of type A,. 
From (51) and (52) 


(54) — 


if" at a 
=. oo [ e”” dv) du =a. 
2r — 00 A—Bu2 


By THEOREM | of the appendix, as mentioned above, we have 
(55) | B(ww |0) — I(wu)| < e¢ forall M>M, 
1.e. 

(56) |B(wa|0)-—a|<e forall M>M, 
and so 
(57) B(wu|0) 7a as Moo, 


Thus the sequence of critical regions (wy) satisfies the condition (72) of the 
definition of a test of type A,,. 

If w be any region defined by inequalities on u and v only (as are the regions 
wy) then, as a special case of THEOREM 1 of the Appendix, we have that for 
any e > 0 there exists a number M, such that for all M > M, 


(58) | Paw) . = / / ett ay dy| <e 


w (u,v) 
where P y(w) = P{Eyew|0}. 
By (31) and (32), noting that =—- = /M- = , we have 
2) = > filu, v)-u-(P;P2)*(1 - P70 ~ P.)* 


= Di filu, v)-uk 


where k = (P;P2)'(1 — P,)*(1 — Pz)? > 0. 
By THroreM 1 of the Appendix, as last stated above, we have 


(60) re = Audv-e*'"9(1 + Ry) 


es -s 
tte > 


=e 


-_ 


2 2 *2e ete 
eat @ aataers tr te 


3 
2 
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where for convenience we have written Au, Av for Acw)u, Acwv the units of u 

and v when sample size is M, and Ry for Rx(u, v) which has the property that 
<stieiase 

(61) D Rulu, v)Age U-Aunv-e tt?” — 0 

uniformly with respect to was M — o~. 


Now let w* denote that part of w where Ry > 0 and w that part of w where 
Ry <0. Then 


(62) ay kuf(u, v) = 2, k. _. ne te?) + 2», k oo uRye™?*”, 


Let 
st well y je uk Rue eo tutte) 


Audv en tutte?) AuAv ; —}(u2+n2) 
bY (me get) wet (a See) ewer 


By Schwarz’s inequality 


(63) 


ll 





=| 


(64) : > Audv uRue —}(u2+v2) } 


“+ 2s 











\4 
2 Audv ae | ; 
w 2r 
But 
(65) 2 u'filu, v) = 2 Audv ze tu? to?) of z Audv wRue tt, 
- wr Qe wt 20 


Now wf,(u, v) > 0 and Z: u’f,(u, v) is finite (since u’ is a homogeneous function 
Ww 


of second degree in the z,’s and so has a finite expectation) and is bounded 
as M > ~. Hence oi u’f,(u, v) is finite and bounded as M — «. Further, 
wr 


as M > « 
(66) = — cor 5 | / uve tt) dy dv. 
- + 


A v . . 
Hence 2, — wRye?™*’ is bounded as M > &. From this result, 
w Tv 


together with (61) and (64) it follows that Sy — 0 as M —> ~ uniformly with 
respect to w. Putting 


(67) > j, Aud ae ed oo hutten 


it will follow in a similar manner that Sy — 0 as M — o uniformly with 
respect to w. Hence 


ap(w|s)}  _ 
OF das = he 


(68) 


= _ k AuAv ue tutte?) + s. 
w 2r 


where Sy = Su + Sy and so Sy —0as M — uniformly with respect to w. 
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Hence whatever be « > 0, there is a number M({ such that for all M > M! 


| 9B (w | a) _k lf -H(u2+02) | 
(69) | oo - on ue du dv | 2 
whatever be the region w. In particular we may take w = wy, and then 


we have 


+00 oo 
(70) = [J ue +) du dy = & [ {ue / et” av} du = 0 
2r 2m Le A—Buy 
wM 


and so 


| 9B(ww | 8) | ! 
(71) Peel hoo <e for all M > M! 
i.€., 
(72) in ewe) > 


Hence the sequence of critical regions (wy) satisfies condition (iii) for a test 
of type A,,. This completes the proof of THEoreEm 1. 

In the above theorem we have found a test which is unbiased in the limit for 
all cases except that for which P; = P, = 3. The following theorem derives 
the test appropriate to this special case, and it is found that in this instance the 
test takes a very simple form. 


THEOREM 2. If P; = P2 = 3, the sequence of critical regions (wm) defined by 


|e + 23| >a in Wu 
(73) 

|t2 + 23| <a elsewhere 
where 

1 +a 422 
(74) val. dtr=1-—a 
m; — iM ar 

(75) a a (¢ = 2, 3), 


is associated with a test of the hypothesis Hyo(@ = 0) of type A,, at level of 
significance a. 


The proof of this theorem follows the same lines as that of Theorem 1 as far 
as inequality (42). On putting P; = P, = } in (42) we get 


(76) — 4m, — 3m; + 4M — as)” — 3(m2 + ms — 3M) > 


(77) (te + x3 — a) > ar. 





PP FP Ee ree 


a77Tt 
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The critical region wy defined in the statement of the theorem is of this 
form with as = 0 and a, = a’. 

Hence the sequence of critical regions (wy) satisfies conditions (2) of the 
definition of a test of type A,,. The sequence of critical regions may also be 
shown to satisfy conditions (77) and (772) for a test of type A,, by following the 
lines of the proof of THEOREM 1 and noting that z2 + 273 = 2M 4m, + m3 — 4M ) 
tends to be distributed as a unit normal deviate as M — « 

On account of the shape of the critical regions in the general case, I shall for 
the remainder of this paper call the tests derived in the above theorem the 
parabolic tests for the cases considered. 


5. Application of the Parabolic Tests. For practical purposes the formulae 
derived above are inconvenient to use. I will therefore express them in terms 
of the deviations of the observed frequencies in the four cells from the frequen- 
cies “expected” when the hypothesis Ho(@ = 0) is true, i.e. in terms of the 
variables d; , where 


(78) d; = m; — Mp = 2:(Mp;)' (i = 1, 2, 3, 4). 


The test then becomes “‘reject the hypothesis Ho at level of significance a if 
v + Bu? > A” where 


_ &(1 — Pi — Ps) — d2P2 — dsPi 


_— {MP,P.(1 — P:)(1 — P2)}? 


(80) v = P,(1 — P;)(2P2 — 1)(di + ds) + P,(1 - P2)(2P; an 1)(d, + de) 
[MP, P.(1 — P,)(1 — P2){Pa(1 — Pi)(2P2 — 1)? + P2(1 — P2)(2P: —1)"}]' 


(fl asf” —}v2 
(81) — e e” dvpdu=a 
2r ° A—Bu?2 
(82) a 5 MP,PA1—Pi(i-P2) | at 
P,\(1 — P:)(1 — 2P2)? + Po(1 — P2)(1 — 2P,)’ 
except when P; = P; = 3. In the latter case reject the hypothesis Ho if 


| da + ds | 


where 


(84) 


1 +a 
Val, ¢ 

The application of this last case (P; = P2 = 3) is straightforward. a may be 
found from the tables of the Normal Probability Integral. d2 and ds; may be 
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calculated from the data, and we may then see whether the inequality (83) is 
satisfied, and so assess our judgment of the hypothesis Hp. 


TABLE I 
Significance of Symbols 
A and B are connected by the following relation: 


"la 
— s / e” dv>du = a. 
27 Leo A—Bu?2 


Table Ia Table Ib 
a = 0.05 a = 0.01 
pos = A — 3.8414588 B pon = A — 6.6348966 B 


B B 
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The general case is also straightforward, except for the determination of A 
from equation (81). To facilitate this I have constructed Tables Ia and Ib. 
These tables correspond respectively to significance levels .05, .01, and from 
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them the value of A corresponding to a given value of B may be calculated. 
The quantity tabled, (p), is the difference between A and a multiple’ (constant 
for a given level of significance and given with the table to which it applies) of 
B. To find A, therefore, B is calculated, multiplied by the appropriate con- 
stant, and added to the quantity in the table corresponding to B. For large 
values of B (40 and over) p is small, and A may be taken equal to the constant 
multiple of B. 

In particular cases when the values of P; and P2 are substituted in the expres- 
sion for B (see THEOREM 1 above) and in (79) and (80) above, these equations 
appear much less formidable. Thus in the case considered by R. A. Fisher 
[3], Pi: = Pe = } and we get 


_ ,/3M 
(85) . 
u = 4M (2d, — d2—d3); v= — 4(6M)*(2d, + de + ds) 
and the test becomes ‘“‘reject the hypothesis Ho at level of significance a when 
(86) = {(d, — dy — ds)” — $(2d; + dz + ds)}/{¥(GM)'} > A 
where 
(87 2. ow / e ax} du = a. 
2r 0 A—u24/9M 


Example. Fisher [3] gives an example of the case P; = Pz = }. In the 
series of experiments that he quotes the observed results fall in the four cate- 
gories respectively as follows: 


m = 32; Me = 904; m3; = 906; m, = 1997. M = 3839. 


Hence d; = —207.9375; d2 + d3; = 370.375. From (86), ¢ = 10863.1. B= 
37.94239. From the tables: 


at .05 level, A.os = 3.8414588 X 37.94239 + 0.0075 = 145.7615 





at .01 level, A.o. = 6.6348966 X 37.94239 + 0.0065 = 251.750. 


Hence we reject the hypothesis that @ = 0, i.e. that there is no linkage, since 
the value of ¢ is well outside even the .01 level of significance. 


6. Power function of the Tests. General Case. The parabolic test as de- 
scribed above has the desirable property that of all tests (at level of significnace 
a) which are unbiased for large values of M this test will detect small variations 
in @ most frequently. However, to get a clearer idea of the properties of this 


1 tka 
1 This multiple is equal to k% where —= [ et? dt = 1 — a, a being the level of 
La 


V 2x 








significance. 








\w 


Ww WS Ne 


TEST FOR LINKAGE 241 


test we shall calculate, as accurately as may be practicable, the power function 
of the test. 

As a preliminary step we obtain a rough idea of the power function by making 
use of the concept of a limiting power function as stated by Neyman [6]. This 
may be defined as follows: 

Let E denote the sample point corresponding to a sample of size M’, and put 


(88) P{Ew ew| 8} = Bur(w| 9’), 


where 8’ = M6, w being a fixed region. Supposing 3’ kept fixed, let M' increase 
and let 


(89) Ba(w| 9) = Tim Bu(w | 9’) 


af this limit exists. 

Then B,,(w | 8’) is the limiting power function of the test associated with the critical 

region w. It will be noted that the limiting power function is a function of #9’. 
In the problem under consideration the parabolic test when the sample size 

is M is associated with the critical region wy. Now it should be noted that 

in the definition of the limiting power function w remains fixed. Therefore 

the limiting power function of the parabolic test for sample size M is 


(90) B° (wx | 9’) = jim Bur(wu | 3). 


The significance of the limiting power function is that for any e > 0 and for 
any v’ there is a number M,,s such that for all M > M,,5 we have in our case 
(by THEOREM 1 of the Appendix) 


(91) | Bu(wa | 8’) — B,(wau| 8’) | <e. 


It should be noted, however, that the limiting power curve (the graph of the 
limiting power function against 6 = 8M “4) may be only a very rough approxi- 
mation to the actual power curve. Furthermore (Neyman, [6, p. 83]) we can- 
not, in general, use the limiting power function of a test to answer the question: 

“How large must we take our sample size M to detect the falsehood of the 
hypothesis Ho(6 = 0) when actually 6 = 6’, with a limiting probability of at 
least, say, 0.95?” 

For if we form a table as below 


M Sim = M'e’ B.(w | 9 ay) 
100 ues aa 
1000 


it is possible that 6,,(w | 8(4)) may never attain the value 0.95. 
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THEOREM 3. The limiting power function of the parabolic test is 


$00 0 
iia | o}* ay du 
— 20 A—Bu?2 


in all cases for which 0 < P; < 1 and P, and Pz are not both equal to 3. 

The proof of this theorem follows immediately from THEorREmM 1 of the Ap- 
pendix by applying the transformation (46)—(48) and putting \ = P,P». 

The above remarks concerning special precautions to be taken with respect 
to the limiting power function suggest the necessity of studying the actual 
power function of the parabolic test by some other method. 

With this object in view, a study was made of the distribution of the function 
¢ = v + Bw for finite values of M and in particular for M = 100 and M = 3839. 
¢ is a discontinuous variate and, for any given value of M, has definite limits 
of variation arising from the limitations on the values of the variables m, stated 


in the inequalities (25), (26) above. These limits of variation of ¢ were found 
to be 


(93) 


(92)  Belwu| 8) = 5 


— $(3M)'(8M — ts) <@ < $(3M)'$M(QM — 1) 
for the case P; = P. = t. 
M = 100, 
M = 3839, 
Also it was found that 
(94) (60) = Br + O— PEN 2P) S- 


Hence when 
—12.25 < ¢ < 5486.86, 
—75.89 < @ < 1310795.75. 


© sgt tr ao ttameementennice: Ma a fl ‘ 
c—wa- ae 
where &(¢ | 6) denotes the expected value of ¢, given the value of the parameter 
6. Thus when P,; = P2 = } we have B = +/2M and so &@|0) = ~/8M. 
Hence when 












M 
M 


100, 
3839, 


&(¢ | 0) = 6.12372, 
&(¢ | 0) = 37.94239. 


It is thus seen that the distribution of ¢ might be represented by a Type III 
curve, since the distribution of ¢ has a finite lower bound and a very long 
positive tail. In order to fit a Type III curve, we must know the second moment 
of the curve as well as its lower bound and mean. The general expression for 
the second moment about zero is too complicated to be printed and so only the 
numerical expressions obtained by giving special values to M are given below. 
These are: 


(i) M = 100 
&(¢’ | 6) = 112.41667 + 165.62963(e° — 1) + 2493.33333(e’ — 1)’ 
+ 1078.00000(e’ — 1)° + 4356.91667(e" — 1), 





(95) 
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(ii) M = 3839 
&(¢’ | 6) = 4318.79213 + 6397.29625(e° — 1) + 3684321.24073(e’ — 1)° 


+ 1636267.33255(e" — 1)* + 261530062.11111(e° — 1)‘. 


Using the above results Type III curves were fitted to the distribution of ¢, 
and approximate values of the power functions B(w y | @), at level of significance 
05, were calculated. This was obtained by evaluating P{¢@ > A.os| 6} and 
assuming the distribution of ¢ to be that given by the fitted curve. Then 


(97) B(wm | 0) = Pld > A.os | 8}. 


The values obtained for the limiting and approximate power functions are 
given in Tables Ila, IIb. Unfortunately the agreement between the two is 
not satisfactory. 

Special Case. For the cases P; = P, = 3} (M = 100, M = 400) power 
functions were calculated on the assumption that for a given value of 6, the 
random variable 2M ‘(dz + d;) is distributed normally about a mean M*(e’ — 1) 
with standard deviation +/e*(2 — e®). This is approximately the case for the 
values of M considered. The approximate power functions so calculated are 
given in Tables IIIa, IIIb. 


7. Parabolic Test and x’ Test. It is interesting to note the close connection 
between the parabolic test and the x’ test as introduced for intuitive reasons 
and normally used in testing for linkage. The x’ test consists of calculating 
the quantity 


ne 1 
3) * ~ MP,P,1—P)G— Ps | 


— P(1 — Py)m, — Pi(1 — P2)ms + Pi Pom}? 
and rejecting the hypothesis Ho(6 = 0) if | x | > a where 


(99) /2n é dt=1-—a. 


In the special case (P;} = P2 = 3) the parabolic test and the x’ test are iden- 
tical; while comparing (98) and (79) we see that in the general case 


(100) u=x. 


Hence in the general case the criterion used in the parabolic test may be 
written 


(1 — P,)(1 — Po)m 





(101) ¢=v+ By’ 


(1) Large Samples. For large samples the first term of the expression v + 
Bx’ is usually of small importance, since 


artruss eraal itt Gatarrtr iret 
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v is of form M+ & (linear function of the d,’s), while 
Bx’ is of form M* x (quadratic function of the d,’s). 
For such samples the x’ test and parabolic test would appear to be nearly 





equivalent. 
TABLE II 
Limiting and Approximate Power Functions of Parabolic Test 
Pi =P, =} 
—2 <@< 1.386 
Table IIa Table IIb 
M = 100 M = 3839 
Power Power 
4 9 
Limiting |Approximate Limiting |Approximate 
—2.00 0.90870 —0.25 0.99932 0.99853 
—1.50 0.99880 —0.20 0.98502 0.97521 
—1.40 0.77656 —0.15 0.87243 0.83620 
—1.20 0.97915 0.69505 —0.10 0.54197 0.52066 
—1.05 0.93786 —0.05 0.17827 0.19223 
—1.00 0.58580 0.00 0.05000 0.04111 
—0.90 0.85024 0.05 | 0.17827 0.21568 
—0.75 0.70467 0.42755 0.10 | 0.54197 0.59517 
—0.60 0.51532 0.15 0.87243 0.91641 
—0.45 0.32258 0.21849 0.20 0.98502 0.99640 
—0.30 | 0.16986 0.12504 0.25 0.99932 0.99999 
—0.15 0.07905 0.05689 
—0.10 0.06280 0.04438 
—0.05 0.05318 0.03866 
0.00 0.05000 0.04069 
0.05 0.05318 0.05021 
0.10 0.06280 0.07429 
0.15 0.07905 
0.30 0.16986 0.26559 
0.45 0.32258 
0.60 0.51532 0.75854 
0.75 0.70467 0.94245 


Turorem 4. The limiting power function of the x’ test is 


2 eee. 1 - —}lu—0 (Py P2)*i—P,)—* 1—P 2) — 4 )2 
(102) B.(w, |) = 1 — —— e du 


V/ 20 


(w,2 denotes the region defined by the inequality | x | > a). 
This theorem may be proved by applying (46)-(48) to Qo(a, 22, 2s) m 
THEOREM 1 of the Appendix, and noting that u = 


x by (100). 
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We notice that 6,,(w,2 | #), for a given value of 3, has the same value for all 
values of M, unlike the limiting power function B,(w» |) of the parabolic 
test. It is this point which accounts for the seeming paradox that, despite the 
manner in which the parabolic test was defined, for all values of # and M 


(103) B.(Wy2 |) > B.(wau | #) 


as may be deduced from (92) and (102). This does not mean that for any 
given # and all M sufficiently large the power function of the x’ test, 6 u(w,2 | 9), 


TABLE III 
Approximate Power Function 
P, => P, = } 
—2o < @ < 0.693 
Table IIIa. Table IIIb. 
M = 100 M = 400 

6 Power 0 Power 
—0.45 0.96288 —0.25 0.99424 
—0.40 0.92161 —0.20 0.95482 
—0.35 0.85072 —0.15 0.79787 
—0.30 0.74351 —0.10 0.47734 
—0.25 0.60197 —0.05 0.16378 
—0.20 0.44054 —0.02 0.06810 
—0.15 0.28380 0.00 0.05000 
—0.10 0.15727 0.02 0.06885 
—0.05 0.07737 0.05 0.17609 

0.00 0.05000 0.10 0.55737 

0.05 0.08029 0.15 0.90213 

0.10 0.18177 0.20 0.99431 

0.15 0.36464 0.25 0.99995 

0.20 0.60278 

0.25 0.82071 


0.30 0.94975 
0.35 0.99299 


is necessarily not less than the power function of the parabolic test, B u(w wu | #). 
For although, given any e > 0, there is a number M,,» such that if M > M,.s 


(104) | Bu(w,2 | 8) — B(w,2 | 38) | < ¢ 
and 
(105) |Bu(wu |) — B(wu|8)| <e 


it may be that for such values of 'M,.» 


(106) 0 < B,(u,2 | 9) — B.(wu| 8) < 2e. 


; 
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The above results show, however, how close the agreement between the power 
functions of the two tests is for large values 6f M. In fact we have 


(107) lim Buo(wa | 3) = Bo(w,: | 9). 


This may be easily proved, since as M increases wy approximates to w,:. 

(2) Small Samples. In order to obtain some idea of the relations between 
the two tests when M is small (i.e. less than 100), the case P} = P. = 1, M = 32 
was considered in some detail. 

In this case our tests at 5% level of significance are respectively 

x’ test, reject if 


(108) | 2y — z| > 8.315 
parabolic test, reject if 

(109) (2y — z)” — $(2y + z) > 69.576 

where 

(110) y=q z=d,.+d;. 


All samples for which the verdicts of the two above tests would not agree 
were obtained. These were as follows: 

(a) Samples for which H, is accepted by x’ test, rejected by parabolic test 
oi? Probability of drawing sample of this type 
_g when Hp, is true is 0.00320. 

(b) Samples for which Hp is rejected by parabolic test, accepted by x’ test 


y=|0 1 2 3 5 6 7 8 8 9 9 Probability of drawing sample 
-———— - —_-—— — of this type when HA, is true is 
z2=/9 11 18 15 1 3 5 6 7 8 9 0.00038. 
Thus the probability of the two tests giving different verdicts when Hp is in 
fact true is only 0.00358. 
It will be noted that the above results imply that 





(111) — Bse(wse | 0) — Bse(w,2 | 0) = 0.00320 — 0.00038 = 0.00282; 


i.e. that the true levels of significance of the two tests are not equal. This is 
to be expected, because of the discontinuity of the probability distribution of 
sample points, which makes it unlikely that the level of significance of either 
test is exactly .05. 

Similarly we can obtain values of Bs2(ws2 | 8) — Bs2(w,2 | 8), the differencesin 
the powers of the two tests with respect to various alternative hypotheses. 
These values were obtained for a few values of 0. 
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6 Bse(Wse | 8) — Bse(w,2 | 6) 


—0.5 0.01625 
0.0 0.00282 * 
0.5 — 0.00006 


These figures indicate that the parabolic test detects negative 6@’s better than 
the x’ test, but that the x’ test detects positive @’s better than the parabolic 
test, although the advantage in this latter case is minute. 

The critical regions associated with the two tests may be represented by 
regions in the (y, z) plane. The critical region for the parabolic test will be 
defined by 


(112) (2y — z)’ — #(Qy +2) >» 
and that for the x’ test, w,: , by 
(113) (2y — 2)’ > v’ 
where vy = v’. 
w,: is therefore the complement of the region lying between the lines L; , Le 
with equations 2y — z = ++/»’; wy lies outside the parabola K with equation 
(2y — z)” — $(2y + z) =». 
Since vy = v’, K meets L,, Le at points near the respective intersections of 


L,, L2 with the line 2y + z = 0. See Figure 1. 
In the diagram the regions V; , V2 contain all sample points for which the 


x’ test rejects and the parabolic test accepts Ho ; U1, U2 contain all sample 
points for which the x’ test accepts and the parabolic test rejects Ho . 

For a given value of @ it is known that the probability distribution is approxi- 
mately such that the quantity 


2 _ {fy — weM(e’ — 1)}’ 


_ | {z + #M(e’ — 1)}? 
°  FsM + eM (e — 1) 


1M — 4M(¢ — 1) 


4 fy t2+ reM(e — 1} 
MM + ¥M(e — 1) 


v + 


(114) 


is distributed as x” with 2 degrees of freedom. 

The ellipses of equal density ¥3 = constant have centers at points (7,M[e* — 1], 
— 3M[e’ — 1]) which must lie on the line 2y + z = 0. When @ = 0 the center 
is at the origin, and the major and minor axes of the ellipse make angles of 
approximately 99.5° and 9.5° respectively with the y-axis. For small changes 
in 6 the angles of inclination of the major and minor axes of the ellipse to the 
coordinate axes are not greatly changed, and we see that as the center of the 
ellipse moves along the line 2y + z = 0 we have 

(1) @ increasing: center moves downwards, tending to increase P{E « U2} — 
{E « Vo} while P{E ¢ V;} and P{E e€lU’,} both become small. Thus 8 u(wy» | 8) 
tends to increase quicker than 8 y(w,: | @). 
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(2) @ decreasing: here we have the opposite effect and 8 y(w » | 6) tends to 
increase slower than B y(w,: | @). 
These conclusionseagree qualitatively with those drawn in the case M = 32. 
(N.B. In the case M = 32 no sample points fall into the region U; because no 
points in U, satisfy the inequalities (25), (26)). | 


8. Some Geometrical Considerations. In this section we shall consider the 
manner in which the situations dealt with above may be interpreted in terms 








Fig. 1 


of geometrical concepts. It will be convenient to consider as variables n; = 
m;/M. The sample space W(n) is then bounded by the four planes 


n= 0 (¢ = 1, 2, 3), | 


3 
> n; = 1. | 


t=1 


(115) 


In this space, corresponding to any admissible hypothesis He specifying 4 
value of 6, there is a point 7’, with coordinates (6"', 6"? , 6"*) where 


6"! = P,Pre’, 
(116) oe”? = P,(1 — Pre’), 
6"? = P(1 — Pye’). 
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These are the proportions of results expected in the first three cells, if the 
hypothesis He specifying @ be true. 
Now, if He be true, we have 


(117) P{m = ni, me = Nz, M3 = 13, m% = | He} = ce % 
where c is constant for a fixed sample size M, and 
| eeaer,[Ee-09] 
a is | 2 
(118) a y 4 te 
1- do 
i=1 


Hence the most frequent position(s) of the sample point E will be some- 
where near the point 75, which I shall therefore call the center of density. It 
will be noticed that, whatever be the value of @, the point 7’, must lie on the line 


(119) =~ = P,P, = —[ne = Pil — P»)| = —[ns = P.(1 —_ P,)). 


This line, a segment of which is the locus of the center of density for our set of 
admissible hypotheses, will be called the line of density. 

In this space the parabolic test corresponds to a critical region comprising the 
exterior of a parabolic cylinder. The equation of the boundary of this critical 
region at level of significance .05 was found for the case P; = P, = 3, anda 
model made of it. Also included in the model were the ellipsoids 


(120) x6 = Koos 
where K.o5 is a constant so chosen that 
(121) P{xi > K.os| 0} = .05 


corresponding to 

(t) the case when Hp is true 

(77) the cases when 
(122) (a) pi = 33 D2 = Ds = 923M = 352 ie. 6 = 0.41 
(123) (b) Pi = 333 P2 = Ps = ga; Ps = FE i.e. 6 = —0.69. 

It was found that in the case P; = P. = } one axis of all the x5-ellipsoids 
was perpendicular to the plane through the line of density and the axis of n;. 
The generators of the boundary of the parabolic acceptance region are also 
perpendicular to this plane. (By “acceptance region”’ is meant the complement 
of the critical region. The acceptance region may be written symbolically 
Wu.) There were further added to the model the intersections with this plane 


of the ellipsoids at probability level .01, corresponding to the three hypotheses 
considered above (6 = 0, 0.41, —0.69) and two others, viz. 


(124) _A = a5} hh = hs = 353 ™ = as i.e. 0 = 0.92, 
(125) Pi = e735 D2 = Ps = ot; = FF =. . 9 = — 1.39. 


j 
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For convenience in making the model to a simple scale (1 unit = 150 cms.) it 
was found necessary to take the sample size M as 1312.5. The model is shown 
in Figure 2. It will be seen that the acceptance region for the parabolic test 
is approximately enclosed between two parallel planes perpendicular to the 
plane common to the line of density and the axis of n,. These two planes, in 
fact, enclose the acceptance region for the x’ test. The vertex of the normal 


Fie. 2 


parabolic section of the parabolic acceptance region is at a comparatively great 
distance ‘‘below” the plane m, = 0. 

As an interesting digression we may use our model to compare qualitatively 
the parabolic test with yet a third possible test of Hy). This test is to reject 
Hy at level of significance .05 if 


(126) x5 > Koos 


and may be called the xé test. The x}-ellipsoid shown in the model is the ac- 
ceptance region for this test. It will be noticed that when @ ~ 0 the ellipsoids 
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of equal density include somewhat more of the acceptance region of the x test 
than of the parabolic acceptance region. This means that the xj test would 
detect that the hypothesis Ho(@ = 0) is false in these cases, less frequently than 
would the parabolic and x’ tests. We also notice that the center of density 
T, leaves the parabolic acceptance region before it leaves the acceptance region 
of the x6 test as it moves along the line of density from the point where 6 = 0, 
whether the direction of motion of 7, corresponds to @ increasng or decreasing. 
This also indicates that the x3 test would act less efficiently than the other 
two tests. 


9. Appendix. In this appendix are obtained various results which, while 
essential to the main argument, would appear as digressions if they were inter- 
polated as required. The numbering of equations in this appendix does not 
continue from that of the previous sections, but forms a separate group. 


LemMa. If fo(m), film), ---, fa(m) be (s + 1) functions of the k variables 
m,, M2, --: , m which are zero except for a finite number of sets of integral values 
of m, +--+ , Mm ; and tf wo be a region in the space of m’s such that 
(1) folm) > Di afm) in wy 
(2) fom) < Qi asfi(m) in Wp 
a, G2, --- , a being arbitrary constants; then if w be any region such that 
(3) DX fm) = Le fim) (i= 1,---,8), 
w wo 

we shall have 

(4) LX folm) < L folm). 
w Wo 


Proor. Let 


6 = D fom) — DX folm) 
(5) 7 
= . X fo(m) — 2») fo(m) 


where wwy denotes the common part of w and wo. 

Hence the region w — ww, consisting of those points of w which are not in 
ww , and so not in wo, is contained in %). Similarly the region w — ww is 
contained in w). Hence, by inequalities (1), 


(6) s> Lh {X asim} - me >> aafdem)} 


and so eee — 
(7) 5> a iz aun} ~ x 12 afin). 


i 
} 
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Since the total number of terms in each double summation is finite, we have 


(8) 6 > Ya (Lf) — Dsm)}. 

But 

(9) Li fulm) = Li flem), G@=1,-..,9). 
Hence 


§6>0, and > fo(m) < D fo(m). 


A lemma similar to the lemma above, where the f’s are taken to be integrable 
functions and summation over the regions w, w is replaced by integration over 
these regions, is given by Neyman and Pearson [9]. The proof given above 
follows the lines of the proof given in that paper. 

THEOREM 1. Suppose that, in a quadrinomial population: 

(t) the cell probabilities are dependent on the number M of trials made, and are 
given by 


Pi = Po + om 
(10) opti calla 
Ps = Pos — Ou 
Ps = Poa + om 
where 
4 4 
(11) dX Poi = X p= 1 
and 
(12) ou = Me™ — 1) 
(iz) 
(13) az; = (m; — Mpo)/(M poi)’ (¢ = 1, 2, 3, 4) 


where m; = number of results falling in i-th cell. 

(iit) w(x), or briefly w, is a region in the space W of x, 22, x3 ; and Py(w) 
is the integral probability law of w corresponding to the values pi, pe, Ps, ps Of 
the cell probabilities given in (2) above when we have M independent trials. 

Then 


(14) Py(w) > wes [fl] e 220(21.72-73) dy day day 
T 04 
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uniformly over W as M — =, where 
Qo(x1, t2, t3) = > xi(1 + poipor) + 2pos 2, x; X;( Poi Poi)’ 
(15) — 2d9{xi(por — porPor) — 22(Por + Poros ) 
— 23(pos + pospor)} + A’ > Poi 


This theorem may be proved by the same method as that used by F. N. 
David [2] in proving the generalized theorem of Laplace. 


I would like to thank Professor Neyman for his invaluable suggestions and 
advice in the preparation of this paper. 
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REDUCTION OF A CERTAIN CLASS OF COMPOSITE 
STATISTICAL HYPOTHESES 


By Georce W. Brown 


1. Introduction. A situation frequently met in sampling theory is the fol- 
lowing: z has distribution f(z, 6), where @ is an unknown parameter, and for 
samples (7, --- , Zn) there exists in the sample space E, a family of (n — 1)- 
dimensional manifolds upon each of which the distribution is independent of 
6; in addition there is a residual one-dimensional manifold available for estimat- 
ing 6. For example, suppose there exists a sufficient statistic T for 6, then on 
the manifolds 7 = T> there is defined an induced distribution which is inde- 
pendent of the parameter. 

A similar situation is observed when 6 is a “location” or “scale” parameter. 
Let z have the distribution f(z — a) for some a, then the set (12 — 2, 23 — 


Y1,°**,%n — %), or any equivalent set, such as (re — Z, --- , Zn — Z), havea 
joint distribution independent of a, and there is a residual distribution corre- 
sponding to each particular configuration (v2 — %,---, 2%, — %). Fisher 


[1] and Pitman [5] have examined the residual distributions in connection with 
the problem of estimating scale and location parameters. In this paper we 
shall be concerned primarily, not with the residual distribution, but with the 
remainder of the sample information, corresponding to the (n — 1)-dimensional 
distribution which is independent of the parameter. It is found, in a rather 
broad class of distributions, that the part of the sample not used for estimation 
determines, except for the parameter value, the original functional form of the 
distribution of x. 

This paper is devoted mainly to a study of particular classes of distributions 
having the property mentioned above. We consider also the theoretical appli- 
cation of this property to certain types of composite hypotheses which may be 
reduced thereby to equivalent simple hypotheses.! The principal results of this 
nature may be summed up as follows: If zx has distribution of the form f(z, 6), 
where @ is either a location or scale parameter, or 9 vector denoting both, then 
there exists, in samples (2, --- , Zn) a set of functions y,(%1,---, Za), 7 = 
1,2,---,p, p < n, having joint distribution D(y, , --- , yp) independent of 8, 
and such that the converse statement holds, namely, if {y;} have the distribution 
D(y., --+ 5 Yp), then x has, for some 0, a distribution of the form f(x, 6). There 
is a corresponding statement when z has a distribution of the form f(z — Zawu,), 
where the {a;} are parameters, and the {u;} are regression variables. , 


1 We use the terms simple and composite hypotheses in the sense of Neyman and 
Pearson [2]. 
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2. Location and Scale. ‘This section is devoted to the study of functions of 
the sample observations which are such that their distributions determine the 
distribution of z, except possibly for location and scale. 

It will be assumed that associated with x there is a function F(z) such that 

(a) F(x) is monotone non-decreasing, 

(b) F(— ©) = lim F(z) = 0, and (c) F(©) = lim F(z) = 1 


with the normalization F(x) upper semi-continuous. F(z) is the probability 
that the random variate takes a value less than or equal to z. If F(z) is as- 
sociated with the random variate z we say that z has the distribution F(z). 
If g(x) is a Borel-measurable function, the Lebesgue-Stieltjes integral 


[ ” gla) dF(2) is denoted by Eig(2)). 


The characteristic function g(t) = E(e'”) determines F(x), that is, if 
[ e dG(2) = [ e dF(zx), then F(x) = G(z). 


Similarly, let F(z, --- , 2x) be such that 
(a) F(a, +++, Dia, 4 + h, Titn, +++ , TH) 2 F(a, +--+, Zi,-+-++, Tt) for 
h>Oandi = 1,2,.---,k; 
see, ar) = 0,0 = 1,2,--+-,k; 


(c) A see, ak) = 1; 


with the normalization F(z, , --- , z,) continuous on the right in each z;. If 
F(x, ,---+ , te) is associated with 2, --- , 2, we say that 7, --- , 2 have the 


joint distribution F(x,,---, 2s). As before, ElH(m,---, 2)] = [ H dF, 
Rk 


where R; is the Euclidean k-space. It is well known that under such condi- 
tions, given Borel-measurable functions y;(a1, --- , %),7 = 1,---,p,p <k, 


then G(yi, ---, Yr) = [. oF (a, --+, 2), where R(y) is the region [y:(%, ---, 
R(y 
Te) < Yr, +++ 5 Yp(Xi,--+ , Lk) < Ypl, is again a distribution function satisfying 


the conditions above. Moreover, / g(y.,---, Ye dG(iy,---, Yp) = 
R 


/ glys(a1, --- , Zu), +++ 5 Yp(Xi, ---, Ze)] dF, where R’ is the set of all points 
R’ 


(1m, ae Tx) such that [yr(z1 , coe Zk); ap Yp(X1, rae rx)] eR. 

If x has distribution F(x), then, by definition, the set (2 , --- , Z,) is a sample 
from this distribution if x, , --- , 2, have the joint distribution F(z) --- F(z,). 

The following theorem states that two distributions giving rise, in sampling, 
to the same distribution of the set 7; — 2n, 22 — In,+++, Ln-1 — In, With 
n > 3, can differ at most by a translation, that is, the distribution of that set 
determines the original distribution except for location. 

THEOREM Ia: Let x have the distribution F(x). Denote by S the set of zeros of 
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/ e'* dF (x) and denote by « the g.l.b. of | t | for tin S. Suppose that the comple- 


ment of Sis econnected.2 Suppose that x’ Kas distribution G(x’), and let 2, ---, 2, 
and x,,---,2, be samples. Then the set Wa = La — Zn,a=1,---,n—1], 
have the same joint distribution as the set w, = La — Le uf and only if there exists 
a constant a such that x’ + a and x have the same distribution. 

Proor: The sufficiency of the condition follows immediately, since w, = 
La — In = (te + a) — (zn +). 

In establishing necessity, only the fact that w; , we have the same joint dis- 
tribution as w; , w; is needed. This hypothesis implies that 


E{e*ltswrttawal ) a Efe lsvittaval) 
that is, 


Arr era i E{e'l@i-a t4e@ 3-21) 
Set v(t) = E(e**), y(t) = E(e**’). The relation above becomes 


(1) o(tie(t)e(— 4 — tb) = W(h)W(h)¥(— tt — &). 


Consider equation (1) for values of t, , #2 in the neighborhood of t = 0. (0) = 
¥(0) = 1, hence there is an interval |¢| < 6, in which g(t) and y(t) do not 
vanish. It is easily shown that ¢(t) and y(t) are each continuous, since e™’, in 
the neighborhood of t = 0, is continuous uniformly for any bounded interval 
of x, and since A may be chosen.so that 1 — F(A) and F(—A) are both as small 
as desired. In the interval | t| < 6 the function f(t) = ¢(t)/(t) is continuous. 
Also, g(—t) = g(t) and ¥(-t) = y(t). Setting & = O in (1) we obtain 
o(t)e(—t) = v(t)y(—2), hence | g(t) | = | ¥(é) |, that is, | f() | = 1. f(d) takes 
values on the unit circle of the complex plane, and f(0) = 1, hence there is an 
interval |t| < 6’ such that z = f(t) lies on an arc y, of length less than 2z, 
containing the point z = 1. Now consider the functional equation (1) for 
| ti | < 46’, | te| < 36’. (1) becomes 


S(b)f(e)f(— 4 — b) = 1. 


The interval | t | < 4’ was so chosen that for | t; | < 436’, | te | < 38’, it is possible 
to define a single-valued branch of the argument of f(t), f(#), and f(t: + #e). 
Letting #2 = 0 we have f(t)f(—t) = 1, hence, replacing f(— 4 — t) by 1/f(4 + 
t2) in the last equation, we have * 


f()f(a) = fl + &). 


Arg f(t:), arg f(t), and arg f(t, + t) are uniquely determined, except for some 
fixed multiple of 27. If we choose the principal value of the argument, i.e., so 





2 The set S is e-connected if any two points p, q, in S can be connected by an e-chain, 
i.e., there exists a set po = p, Pi, °** , Pn-1, Pn = Q, Such that | pi — pit| < ¢,i = 1, 
ms, ME 
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that 0 < arg f(t) < 2x, we must have 
arg fii) + arg f(t) = arg f(t + &) 


for | t:| < 46’, || < 36’. Since arg f(¢) is continuous, any solution of this well 
known functional equation must be of the form arg f(t) = at. |f(t)| = 1, 
therefore there exists a constant a such that f(t) = e for |t| < 48’, that is, 
g(t) = e“W(t), for |t| < 46’. By use of (1) this may be extended to hold for 
all tsuch that | t | < e, where eis the minimum modulus of all ¢ such that g(t) = 0. 
(1) may now be used to extend the relation for all ¢ such that 9(t) ¥ 0 by choos- 
ing an e-chain connecting the origin to the point t. We know already that 
g(t) = ey(t) if o(t) = 0, hence it holds for all ¢. This relation says that 


E(e'*) = E(e“*'*”), hence z’ + a and z have the same distribution, thus 
completing the demonstration of the theorem. 

It should be remarked that the set (1; — tn, --- , Zn-1 — Ln) may be replaced 
in Theorem Ia by any equivalent set, for example, (a. — Z,---, Za1 — &). 


The next result is of the same nature as Theorem Ia except for the replace- 
ment of the location parameter by a scale (positive or negative) parameter. 


THEOREM IB: Let x have distribution F(x), such that the zeros of [ eftloglel aA (~) 


are nowhere dense, and let x’ have distribution G(x’). Let 2,,---,2n and 
ti, --- , Ln be samples from the distributions of x and x’, with n > 3, then the set 
Wa = Le/tn, a = 1,---,n — 1, have the same distribution as the set wa = 


z,/x, if and only if there exists a constant c such that cx’ and x have the same 
distribution. 

Proor: The sufficiency of the condition is evident. Suppose, then, as before, 
that w; , w2 have the same joint distribution as w;, w2. Log | w, | and log | we | 
have the same joint distribution as log | w; | and log | w2 |, hence by application 
of Theorem Ia to log | z| and log | 2’ | it follows (since the complement of a 
nowhere dense set is -connected for every ¢) that there exists a constant a such 
that 


| eit tog lz dF (zx) ot | eit log |z"I—al dG(x). 


2 


Let y = e “x’, then | x 





and | y | have the same distribution, and 
(2) / e't tele! GP(z) = / eXt toslvl GH(y), 


where y has distribution H(y). We now have to show that either y or —y has 
the distribution of z, that is, it must be shown that either H(y) = F(y), or 
H(y) = 1 — F(-y). 

By the first part of the theorem the functions u, = y:/ys and U2 = Yy2/ys have 
the same joint distribution as w,, w.. It is clear that the mean value of any 
function of wu; and wz is the same as the mean value of the corresponding func- 











258 GEORGE W. BROWN 


tion of w; and w.. Hence 


[Tf eilts log |wil+t2 log |wel) sgn w; sgn we dF (2x1) dF (x2) dF (23) 


a [Tf gilts log |uyl+t2 log |ual) sgn u, sgn us dH (y:) dH (yz) dH (ys), 


—o 


where sgn x = 1, forz > 0,sgnz = —l1forz < 0. 


(sgn w;)(sgn we) = (sgn 21)(sgn 22), 


so that the last equation becomes 


[ff gilts (log |z1]— log |z31)+t2 (log |zq!— log |z31)] sgn 2 sgn a dF (21) dF (22) dF (zs) 


(3) 


t+) 
- [ff etts tantent-rectonrtss ee tost— rents sgn y; 


X sgn ye dH (y:) dH (ye) dH (ys). 


v(t) ~_ [ie log |=] dF (zx); y(t) _ [ log |yl dH(y) 


y(t) = fen sgn rdF(z); e(t) = |e log lu! son y dH(y). 


From (3) we have Wo(ti)¥e(te)¥i(— tr — te) = ge(ti)go(te)e.(— 4 — t) for all 
t, , te, and from (2) we have y(t) = ¢,(t) for all ¢, hence, if ¥i(— t, — te) ¥ 0, 
Wo(ti)Wo(te) = ¢e(ti)¢2(tz). By hypothesis the zeros of ¥,(¢) are nowhere dense, 
hence if ¥:(— t; — t) = O there is a sequence ¢, such that t' — — 4 — b 
and y(t) # 0. Now take an arbitrary sequence t{” such that t{”) > t, 
then &” = — t — ¢§” must tend tot. For each n we have y2(t{”)y2(”) = 
go(ti”)y2(ts”). All the functions appearing are continuous, thus we see that 
Wo(ti)We(te) = ge(ti)go(te) for all 4; , fg. From this it follows directly that either 
Yo(t) = go(t) for all ¢ or Yo(t) = —¢ge(t) for all ¢. We have? 


i (t) -_ [ e* log = F(z) + [. e" log (—z) dF(z) 


wi) = [  (itloss aR (2) — [. eit! apn) 


* The assumption has been made implicitly that F(z) and G(z) are continuous at z = 0, 
otherwise the distribution of z;/z, is not properly defined, and the functions ¢;(t) and y(t) 
are then not defined. Similar assumptions will be made whenever necessary in later 
theorems. 
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2 0 
et) = [et *aH(a) + [et dH) 


« 0 
and a(t) = I et '°8* dH(x) — [ e198 dH(z). 

2 
Combining these expressions with the relations obtained above leads, by Fourier 
inversion, to the result that either F(x) = H(z) or H(z) = 1 — F(-—z). We 
have shown that either y or —y has the same distribution as z, that is, either 
es’ or —e ‘x’ has the same distribution as z. 

Theorem Ib states essentially that the joint distribution of the set z./z,, 
a=1,---,n — 1, determines the distribution of x except for a scale parameter 
and possibly a reflection. In the event that z has an asymmetrical distribution, 
and if it is desired to rule out negative changes of scale, a variation of this pro- 
cedure is necessary. The next result is appropriate for this situation. 


THEOREM Ic: Let x have distribution F(x) such that the zeros of / eX loslzl d(x) 


are nowhere dense, and let x’ have distribution G(x’). Let 1,---,2n and 
21, +++, t, be samples from the distributions of x and x’, withn > 3. Express 
t,-*+, &n and x1, +--+, 2, in spherical coordinates 


, , 
m1 = 7 cos 6, Zz, = 7’ cos 6; 


+ , ° , * “a 
Ze = r sin 6; cos 62, Ze = 1’ sin 6; cos 62 


. ° . / . p » , . , 
Zn = 7 sin 4, sin 6 --- sin On-1, Zn, = 7’ sin 6; sin 62 --- sin 0,1. 


Then 0,,--+, On-1 have the same joint distribution as 0;,---, @n-1 if and only 
if there exists a positive constant k such that kx’ and x have the same distribution. 

Proor: Sufficiency of the condition is an immediate consequence of the fact 
that 6,, --- , 6,1 are invariant under the transformation z = kz’, with k > 0. 
If 6, +--+, 0,1 have the same joint distribution as 61, ---, 6-1 then the set 
{r./tn} have the same joint distribution as the set {x4/z,}, hence, by Theorem 
Ib, there exists a constant c such that cr’ has the same distribution as z. To 
establish necessity of the condition we must show that |c|z’ has the same 
distribution as z. 

Set y = |c|z2’, and let y1, --- , yn be expressed in spherical coordinates; 
y1,--+, Yn have the same angular coordinates 6;,---, @,-1. This implies 


that 2,/r and z2/r have the same joint distribution as y,/R and y2/R, where 
R= Vy + os ;— = = 2;/| x2 |, therefore 2;/| zz | has the same dis- 


tribution as y;/| y2 |, so that 


c it log | U1 a 7 ét log = Yi 
J | e sgn =) dF (x) dF(a2) = I. / € | ‘mn (4) dH(y:) dH (ys) 
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if y has distribution H(y). Sgn (2) = sgn 2, so that the last equation 


| ae 


yields 
[ eff toe lal sgn x dF (z) .[ e ttloglel aR (7) 


-_ [ ett log |x| sen x dH(2). | ett log lz dH(z). 


We know already that | x | and | y| have the same distribution, so that 


(4) [ e** log |z} dF(z) os [ e* log Iz! dH(z), 
thus 
(5) [ ef tonlzl oon a dF (x) = [ greens sgn x dH(z), 


except possibly for zeros of [ e ‘tleslel r(x), By hypothesis the exceptional 


points are nowhere dense, so that, by continuity, (5) holds for all ¢. (4) and 
(5) together imply, as in the proof of Theorem Ib, that F(z) = H(z), i.e., x and 
|c| a’ have the same distribution. 

The next three results are generalizations of Theorems Ia, b, c, to analogous 
multivariate situations. The first of these is a direct generalization of 
Theorem Ia. 

THEOREM IIa: Let 2,,---, 2% have joint distribution F(x,,---, xx) such 


that the complement of the set S of zeros of | e'*'** dF (a, --- , tx) is econnected, 


where «¢ is the g.l.b. of | t| for (t) in S, and let y:, --- , yx have joint distribution 
G(m , ‘ini » Yk): Let (xr , ay , te) and (yf , ere + Yk), a — 1, es , , be samples 
from these distributions, with n > 3. Then wig = m%—-2t,i=1,---,k, 
B =1,---,n — 1, have the same joint distribution as the corresponding set vis = 
yi — y? if and only if there exist constants a, ,--- , a, such that y; + a, ---, 
yx + ay have the same joint distribution as x, --- , %. 

Proor: Set 


_k 
g(t, eee, ti.) = [ her dF (x, +++, Zk), 


. & 
V(t, ee ti) 7” [them dG(y, oe » Yk). 
If wis, i = 1,---, k, 8 = 1, 2, have the same joint distribution as v;g , then, 
as in the proof of Theorem Ia, we have 
ell, -+- , ta)elte, --+ , tee)e(— ti — be, +++, — bea — ke) 
= W(tu, +--+, taWlhe, «++ , be)W(— tir — te, +: 


(6) 





» — ta — ta). 


g(( 
(0, 
g( 
v( 


ar’ 


(u; 


an 











1S 
of 


Nn, 
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Again, as before, |¢| = |W|; ¢(i,---, te) and ¥(4,, --- , &) are continuous; 
g(0, 0, --- , 0) = ¥(0, 0, --- ,0) = 1. There will exist a neighborhood JN of 
(0,0,---,0) such that for (t,---,t)¢«N the function f(t, ---,t) = 
g(t, vty te) 


ih) te) is defined and continuous. Then there will exist a neighborhood 
ty ***s 3 
N’C WN such that in N’ there exists a uniquely determined branch of 
arg f(t:,---, t&), continuous in N’, and such that if (4,---, t) «N’ and 
(uy, --* , Ux) €N’ then arg f(t; + um, --- , & + ux) is also uniquely determined 
and continuous. For (t) e N’ and (u) e N’, arg f satisfies the relation 


arg f(t, a , tx) > arg f(u , ae Uk) = arg f(t + a » be + Ur). 


It is easily shown that any continuous function satisfying the equation above 
must be of the form 2a,t, , therefore 


_k 
(7) gl, «++ te) =e Sry, --- , th); (t) e N’. 
Just as in the proof of Ia the relation (7) may be extended, by use of (6), to 
hold for all ¢. This implies, finally, that the set {y; + a:} have the same 
joint distribution as the set {z;}. 
Theorem IIb is a generalization of Theorem Ib to multivariate distributions. 


THEOREM IIs: Let 2, --- , 2% have distribution F(a, --- , 2) such that the 
zeros of / eftrloglzrl gi(z,, .-. , 2%) are nowhere dense, and let y,, --+ , yx have 
distribution G(y: , peaks » Yk). Let (ar ~~” , Ze) and (yt eo Se Ye), = 1, *?on 


be samples, with n > 3. Then the set wig = 23/x? ,i = 1,---,k, 8 =1,---, 
n — 1, have the same joint distribution as the corresponding set vig = y;/yi if 
and only if there exist constants c,,--- , cx such that the set cy; have the same 
distribution as the x; . 
Proor: The demonstration is parallel to that of Theorem Ib. By Theorem 
Ila there exist a; , --- , a, such that 
E(e*=*" log ler) - E(e28r(08 lerltery 


Set z, = e*"y,, then 
(8) | eiztr log Izrl dF (21, omies x) a. | eiztr log |zrl dH (a, ree zi), 
where (z;, --- , 2x) have distribution function H(a, --- , zx). 
We shall continue the proof from here under the assumption that k = 2. 


It will be evident how the proof goes for any k. We have, since z2/2; have the 
same joint distribution as 2°/z3, 


° 1 2 
[fy eZ trp(log |28|—logle3!) sgn (2) sgn (2) dF (zi, zi) dF(z?, 22) dF (2, 23) 
1 1 
(9) 
1 2 
=/{/ e' 2 tra(log |z81—loglz31) sgn (2!) sgn (2) dH (x1, x3) dH (xj, x3) dH (23, 23). 
1 1 
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Both members of (9) are evaluated as products, just as was done in previous 


proofs, and from the result, combined with ($), we conclude, as in Theorem Ib, tic 
that fo 
C) ) it 
if iets log Izr] sgn 21 dF (a1, 22) = 8 [I eizts log |zrl sgn 2 dH (x, 2), 
: a ( 
where s; = +1, for all (t,, f). Similarly 
[/ eiztr log Izrl Sgn 22 dF (zy, 2) = & [/ eiztr log Izrl sen 22 dH (a, 22) M 
and 
If ef tr loglerl con x1 sgn te dF (21, t2) = 83 II ef 2 tr log lzrl con 7, sen a2 dH (x1, 22), T 
_— —oo C 
with 8s. = +1, 83 = +1. 
a 
Set gilli, te) = [| f™ ten lor! sgn dF (a, 2) b 
a(t, ts) _ I] eiztr log Izrl sgn 22 dF (xx, 22) z 
sit ( 


g(t, te) = [| ef*trloglzrl con ay sgn 22 dF (a1, 22) 


—7o 


and let Y(t, te), Ye(ti, te), and Wilt: , te) denote the corresponding transforms 
of H(z, x2). We have 


(a(t, te) = si~rltr, te) 
(10) ga(ti, te) = sepeltr, tr) 

gie(ti, f) = sspre(t, be) 
with s, = +1, se = +1, and ss = +1. 


1 2 
Now, as in (9), by considering _ aeedinael sgn (2) sgn ()| Wwe 
Z1 
obtain the relation 


¢r(tr , ter)pe(tie , ter)gi2e(— tn — tie, — ter — tee) 


= Wilt , tor)Wo(tie , tee)Wie(— tir — tie, — ter — te); 


showing that s; , se, ss, may be chosen so that s,s983 = 1, that is, 8:82 = 83. 
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Consider now the variates z, = s,z,,r = 1,2. Let K(z1, 22) be the distribu- 
tion function of 21, 22. If we let 0:(t; , te), Oo(ty , te), and Oy2(t; , fe) be the trans- 


forms of K which correspond to g(t: , f2), ge(t , te), and gie(t) , 42) respectively, 
it is evident that 


gill; , le) = Or(tr , te) 
(11) go(ti , te) = Oo(ti , te) 
pio(ti , te) = Oro(tr , te). 


Moreover, from (8), 


co 2 

Zt. 1 iZt, 1 
If eiZtrloglerl aia, a) = If ef *'r loslzrl GK (a, , 22). 
—o —- 


The last relation, together with the equations (11) imply that F(x) and K(z) 
coincide in each quadrant, thus F(a , x2) = K(a, x2) for all 2, x. 

The final result is that 2; , z2 have the same distribution as 2 , 22, i.e., sey: 
and sse**y2 have the same joint distribution as 2, and 2. 

The next result bears the same relation to Theorem IIb that Theorem Ic 
bears to Theorem Ib, that is, only positive scale changes are to be permitted. 


THEOREM IIc: Let 2, ---, 2% have distribution F(a,,--- , xx) such that the 
zeros of | ef tr loglerl qr(e, |... , a.) are nowhere dense, and let y:, ---, yx have 


distribution G(yi,---, yx). Let (ai ,---, rm) and (y~,---, ye), a = 1, 2, 
.. ,n, be samples withn > 3. Express xi ,--- , xp and yf, --- , ye in spheri- 
cal coordinates 


1 1 1 1 
xi = 7; cos 0; , yi = R; cosy, 


2 : 1 2 2 : 1 2 
x; = 7; sin 06; cos 6; , yi = R; sin g cos Gj, 


a? = r;, sin 6; --- sin 67"; y? = Risin gy --- sing}. 
Then {68},i = 1,---,k, 8 = 1,---,n — 1, have the same joint distribution 
as {go} if and only if there exist constants k; > 0, i = 1, ---, k, such that the 
set ky; have the same joint distribution as the set x; . 


eat os ' ai 
Proor: If {62} have the same distribution as {g?} then it follows that (s 
Zs 
yi 
have the same distribution as {ui , hence by Theorem ITb there exist constants 
a 
c; such that {c:y;} have the same distribution as {x;}. Set 2; = | ci | yi; we 
wish to show that {z;} have the same distribution as {z;}. By equation (8) 
in Theorem IIb it is known that {| z;|} have the same distribution as {| 2; |}, 


moreover, if we express z{ in spherical coordinates, the angular coordinates are 
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1 


1 
ss z 

the same as those of yj‘ , therefore fea have the same distribution as ‘s ; \ 
vs + 


2° 
) 
since these functions are obtainable in terms of the angular coordinates. th 
As before, we shall continue the proof from here under the assumption that 
k = 2. The procedure is a generalization of the procedure in the proof of m 
1 
vi as h 
Theorem Ic. sgn x; = sgn oa and similarly for y, therefore 7 
é 
2 z 
[eitaces lspl—tog l=) son ay dF (a1, 22) dF (23, 23) ‘i 
(12) ; ti 
an [fet tr(loglerl—loglz2)) son xt dH (x1, t2) dH (xi, 23), i= 1,2, 
x 
where it is assumed that z; , ze have distribution H(z, ze). As before, set : 
elt, ) = | ei! dr(a,, 2), : 
eC 
eit, ) = je log Izrl sgn 2, dF (2, 22), i= 1, 2, tk 
tk 
guts, te) ‘ni | one sgn 2, sgn ao dF (2, a), ; 
and denote the corresponding transforms of H(2, 22) by @(t, te), A(t, &), 
62(t, 2), and @2(4, t2). It has been remarked already that {|z:|} have the | ° 
same distribution as {| x; |}, therefore 9(t, , t2) = g(t, 2). Equation (12) yields . 
the relation ¢;(t; , )e(—t, —te) = O:(ti , t2)0(—t , —te), 7 = 1, 2; the zeros of : 
g(t, , 2) are nowhere dense, so that it can be concluded that ¢;(t; , ts) = 6,(t, , b), é 
~7=1,2. Now, from an equation similar to (12) we obtain gi2(t; , f2) = Or2(t: , &). 
As in Theorem IIb, the four relations above together imply that F(a, x2) = 1 
H (a1 , 22), in other words, {| c; | y:} have the same distribution as {z;}. 
We are now in a position to combine some of the preceding theorems so as to 
obtain analogous results for scale and location parameters together. 
THEOREM IIIa: Let x have distribution F(x) such that the zeros of | e''* dF(z) 
satisfy the condition of Theorem Ia, and the zeros of 
[ff et! log lz,—z3)+ite log |lzg—z31 dF (x) dF (x2) dF (zs) q 
t 
are nowhere dense, and let y have distribution G(y). Let 1,---,2, and 
Yi, °°: » Yn be samples, withn > 9. Thenwa = ———* a =1,---,n-2, , 
n-1 ~~ &n 





have the same joint distribution as the corresponding set wy 





only if there exist constants a, c, such that c(y — a) and x have the same ne distribution. 
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Proor: Sufficiency of the condition is an immediate consequence of the fact 
that w. is invariant under transformations of the form y’ = c(y — a). Assume 
then that {wa} and {w,} have the same joint distribution. By elementary 
transformations it is evident that the functions ~~ — J ties - ao . on , 

7 — Lo L7 — Lo Ig — Lo Tg — Uy 
have the same joint distribution as the corresponding functions of the y’s, if 
n> 9. Since 1,--- ,2, form a sample it follows that the pairs {z, — 23, 
te — ts}, {%4 — Xe, Xs — Xe}, {X7 — To, Xe — Xo}, have the same joint distribu- 
tions and are pairwise independent, and similarly for the corresponding func- 
tions of the y’s. Theorem IIb assures the existence of constants c; , ce, such 
that ci(yi — Ys), C2(y¥e — ys) have the same joint distribution as (x, — 23), 
(v2 — 23). Considering separately the marginal distributions it is seen that 
a(yi — ys) has the same distribution as ce(y2 — ys). yi — ys and ye — ys have 
the same distribution, therefore either ce = ¢;, ore: = —C,. Setua = 2a — 23, 
Ve = Ci(Ya — Ys), @ = 1, 2. We have, for the distributions of (uw, we) and 
(y,, v2), relations corresponding to (10) in Theorem IIb, with the additional 
condition that s,; = s_, because of the symmetry in the variables. This implies 
that either (v1 , v2) or (—v, , —ve) have the same joint distribution as (uw , ue), 
that is, there exists c such that c(y: — y3) and c(y2 — y3) have the same joint 
distribution as x, — z3 and zz — x3. Application of Theorem Ia now completes 
the proof. 

Just as before, there is an analogous situation when we consider angular 
coordinates instead of quotients. The proof is immediate; the angular coordi- 
nates determine the angular coordinates of {z; — 3 , 22 — 2X3}, {24 — Xe, Xs — 2}, 
and {x7 — 2,23 — 2}, arranged as a sample. Then the constants c; , ce in 
the proof of Theorem IIIa are both positive; it follows that c, = ce. Applica- 
tion of Theorem Ia gives 

THEOREM IIIB: Let 2,,---,2n and y1,---,Yn satisfy the hypotheses of 
Theorem IIIa. Set 





, 
1 — In = 7 cos hy, Yi — Yn = 7’ cos 61, 


. . , , 
te — Xn = T Sin 6 cos A, Y2 — Yn = 7’ sin 6; cos 2, 


. : . ; . , 
Yn — Ln = 7 Sin & --- SIN Oy_2 ; Yn-1 — Yn = 7’ Sin 6 --- Sin On-2. 


Then 0,, --- On-2 have the same joint distribution as 6;, --- , On-2 if and only if 
there exist constants a and c > 0 such that c(y — a) has the same distribution as z. 

Theorem IVa is a generalization of Theorem Ia to cover arbitrary linear com- 
binations of some subset of the sample. 


THEOREM IVa: Suppose x has distribution F(x) such that / e** dF(x) does not 


vanish, and let y have distribution G(y). Consider the functions wa = 


ta — D, laptmip » Wa = ie = Dd lapymp » & = 1,2,---,m,B = 1, 2,-++5 
p=1 p=1 
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nm — m, and suppose thatm > n — m. 


n—m 


bution as {wa} and if 7 lap * 1 for some a, it follows that F(y) = Gly); if 
p=1 


Then, tf {wa} have the same joint distri. 





2. las = 1 for all a there exists a constant a such that F(y — a) = Gy). 
B 


Proor: Denote the characteristic functions of x and y by g(t) and y(t) respec- 
tively. By expressing the fact that {wa} and {w.},a = 1,2,---,n—m+ 1, 
have the same characteristic function we obtain the functional equation 

n—m+1 n—m n—m+1 n—m+1 n—m n—m+1 
TT ot) Th o(-X tote) = “TT vee) Th v(-2 tat). 
a=] B=1 a=1 a=1 p=1 a=1 
By hypothesis ¢(¢) does not vanish, therefore ¥(¢) has no zeros, because of the 
relation above. g(t) and y(t) are continuous, thus the function f(t) = 
log g(t) — log y(t) can be uniquely defined in a continuous manner for all t. 
The equation above becomes 


(13) "D flte) + ZA(-k lata) = 0. 


The constants lag are necessarily linearly dependent, so that, for some a, ],3 
can be expressed as a linear combination of the others; suppose then that 


n—m 


bon-etd = 7 Calas. 


a=1 






Putting these values in (13) we have 


n—m+1 n—m n—m 
(14) Se He) + Ze F(-Z tlle + te-mssee)) = 0. 
It can be assumed that Ze%, ¥ 0, for, if e. = 0 for all a, we have I,_m41,3 = 0, 
6 = 1,---,n — m, that is, Waemet = Yn—mt1 ANd Wa—-mgi = In—myi, hence sz 
and y have the same distribution. Assuming e, ~ 0, set ta = —€atn—ms, 
a = 2,.-.,n — m, in (14), obtaining 


(15) f(t) + De f(—eatnmss) + f(tn—m+1) + 2 f(—hislt + €1tn—m41)) ac 0, 
now, recalling that f(0) = 0, set tr-mir = O, getting f(t) + >> f(—list). 

° B=1 
Evaluating this with argument é + eé:tn—m4:, and substituting back in (15) it 
appears that 


n—m 


(16) f(t) + f(tn—m4i)+ Di, f(—eatnmss) — Slt + €:tn—m41)- 


Now setting ¢; = 0 in (16) we have the relation 


n—m 


S(tn—m+1) + dX S(—eatn—m41) = fle enemies} 
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thus we have finally f(t) + f(eitn—mis) = f(t: + eCitn—m4i), Or, since e; ¥ 0, 
fli + tb) = f(t) + f(t). The last relation implies that JO = ct, since fi is con- 


n—m+ o—m4 


tinuous. Now replace f(t) by ctin (13), gettingo{ - ~ - a slate) = 
a=1 


0, that is, either c = 0, or z las = 1for alla. We conclude then that g(t) = 
y(t), unless x las = 1 for all a. If >> las = 1 for all a we have g(t) = e“y(t). 
8 


g(—t) = g(t) and ¥(—t) = y(t), hence c is of the form c = 2a, where a is real, 
in other words g(t) = e“y(t), thus concluding the proof of the theorem. 

It was assumed in Theorem [Va that g(t) has no zeros. If g(t) has zeros 
we have proved that, for an interval |t| < «¢, g(t) = W(t) (or g(t) = e Y(t). 
This does not necessarily imply the result of Theorem IVa, but it does imply 
at least that if the kth moments of z and of y (or of y — a) both exist they 
are equal. 

The last result in this series can be proved by methods similar to those used 
in Theorem IVa. 

THEoREM IVs: Let x and y satisfy the hypotheses of Theorem 1Va. Suppose, 
moreover, that m > 2(n — m), that the rank of || lag || is n — m, and that 


n—m 


>> las ¥ 1 for at least 2m — n values of a. Then, if there exist constants {ca} 
p=1 


such that the set {caWa} have the same joint distribution as {wa}, it follows that, 
for some a, Cay has the same distribution as x. 


3. Application to Composite Hypotheses. The results of section 2 have a 
significant application in the theory of testing composite hypotheses. Suppose 
that x has a distribution of the form F(z, 6, 62), and that the hypothesis 
6, = 62 is to be tested, without reference to the value of 6,. We assume that 
the parameters are independent, i.e., F(z, 6:, 02) = F(x, 6:, 62) implies that 
6, = 6; and 6. = 6;. It is true in a wide class of important cases that, given 
a sample 2, --- , 2, from the distribution F(z, 6,, 62), there exist functions 
Yo(%1,--*,%n), a = 1,2,--- ,p, such that {y.} have joint distribution inde- 
pendent of 6, , but depending on 62. Now if the {y.} are such that their joint 
distribution redetermines the original distribution, except for 6, , one can reason- 
ably use the p-dimensional distribution of the {y.} for testing the hypothesis 
6. = 62, thus reducing the composite hypothesis to a simple hypothesis. In 
testing this simple hypothesis, every alternative hypothesis (corresponding to a 
value of 62) determines a distribution of z among the alternatives F(z, 6; , 62) 
except for the unknown 4; , that is, there is a one-to-one correspondence between 
the two sets of alternative hypotheses, expressed by the fact that if 6. = 63 
then the distributions of the set {ya} corresponding to 02 = 62 and 6: = 63 
must be different. 

Suppose, for example, that it is desired to test whether y = z — a for some a 
has the distribution F(y, 6°), with the assumption that, for some a, y has the 
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distribution F(y, 6). Given a sample one can form the set we = La — 2p, 
a =1,2,---,n — 1, obtaining the distribution G(w,, --- , Was, 8); now con- 
sider the simple hypothesis @ = 6°, knowing that G determines 6, by Theorem Ia. 
Similarly one can test whether cx, for some c ¥ 0, has distribution F(y, 6°), 
by forming we = %a/X,, a = 1,---,n — 1, or by expressing (m1, --- , 2p) 
in spherical coordinates and considering the angular coordinates, according to 
whether both positive and negative or only positive values of c are to be allowed. 
In the same way one can test the hypothesis @ = 6° under the assumption 


that c(z — a) has distribution F(y, 0) by forming wa = a ,a=l1,--., 
n—1 — In 
n — 2, or by expressing (41 — Zn, +--+ ,2Zn-1 — Xn) in spherical coordinates and 


considering the angular coordinates. 

Theorem IVa may be applied to analogous problems, in which the hypothesis 
6 = 6° is to be tested under the assumption that y = u — La;z; has distribution 
F(y, 6) for fixed values of the z;, with the a; unknown. In such problems 
there exist linear combinations of the observed values of y which are independent 
of the a;. By Theorem IVa, under certain conditions the joint distribution of 
these linear combinations determines the original distribution of y, without 
regard to the a; 


In applying some of the preceding results we must verify in certain cases that 
the zeros of / e'* dF(x) are nowhere dense, for a certain distribution function. 


By a change of variable the condition of Theorem Ib can be stated in this form; 
moreover if F(x) satisfies this condition it is evident that it satisfies the condi- 
tion of Theorem Ia. A sufficient condition applicable to a considerable class 
of cases has been obtained by Levinson [4]; if f(z) is O(e °) as x — «, where 


6(x) is monotone and [ = dx diverges to ©, then / e"*f(x) dx cannot vanish 
1 


on an interval without vanishing identically. It is evident that it is likewise 
sufficient if the corresponding condition holds as z — — ~ instead of +. In 
particular, if there exists A such that f(z) = 0 for z > A (orforz < A) itisa 


consequence of the Levinson result that | e''*f(x) dx has no intervals of zeros. 
It can be established easily that if f(x) is majorized by | x | “*, € > 0, in the 
neighborhood of the origin, then / e't lee lzl¢(¢) dx has no intervals of zeros. 


As a simple example consider the rectangular distribution on (0, 1). Let 
(x — a)/r have this distribution with a unknown, r > 0, and suppose that we 


are interested only in r. Given a sample 2, --- , 2, form the functions y. = 
(ta — Xn)/Tr,a = 1,---,n — 1. Set yw = max (y., 0), yx = min (Ya, 0). 
Then it can be shown that y; , --- , yn have probability density (1 — yy + yz) 


in the region —1 < ya < 1, yw — yt < 1, zero elsewhere. Y = yu — yx is 
of course the quotient of the sample range by yr. It can be shown that y has 
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density n(n — 1)(1 — y)y" *dy. Theorem Ia makes it possible to base any 
tests not involving a on the distribution of the y., since if the y. have the 
stated distribution then (2 — a)/r for some a must have the rectangular dis- 
tribution. 

Similarly, suppose y = (x — a)/r has the distribution e ’, y > 0, for some 
a,r. Then wz = = 4 a = 1, 2,---n — 1, have distribution density 
1 germs 
n 
used to estimate r. 

Let us examine the distributions of functions of the type considered, in the 
case of normality. Assume that 2, ---,2, are a sample of n observations 
from a normal distribution with unit variance and unknown mean. The 
variables ya = Le — %1,a@ = 2,---,n, have a joint normal distribution with 
zero means and matrix of variances and covariances || A” || = || 1 + 44; ||. 
Then Theorem Ia shows that if {y.} have this joint distribution then z is nor- 
mally distributed with unit variance. Note that xi. = DAsyy; = (ra — &)’. 
If we had x = 2’/o, then S(rq — 2)’ = o'x%4, giving the estimate 
a Z(z, — #’)* for o’. 

n—1 

There are, of course, many ways in: which the matrix || A;; || may be trans- 
formed into a diagonal matrix in order to obtain a new set of independently 
distributed variates; one convenient set is the set ~/} yo, ~/3 (ys — 442), --- , 
4/ (v. belli 2. ue) In terms of the original z’s we have +/3 (22 — 11) 


n— 1 a=2 
7 n—l 
V/2 (xs — 3(t1 + 22)), 4/2 (:, “2 : i ~ *-); these functions of the 
_ a=1 
data are independently distributed according to the normal distribution with 
zero mean and unit variance. 

Similarly, in the case of a sample x, --- , 2, from a normal distribution with 
zero mean and unknown variance, there exists a set of m — 1 functions with 
distributions independent of the variance. A convenient set of functions is 
the set 


, where w, = min (0, wa). Again, the latter distribution may be 


se /m Lm41. 
n= ——————» 
VY di 

t=1 


It is known (see Bartlett [1]) that the variables ¢,, are independently distributed 
according to student ¢-distributions with m degrees of freedom respectively. 
The set t, determines the set of angular coordinates obtained by expressing 
%,,---+ ,2n in spherical coordinates, hence we can conclude, conversely, that if 
{tm} have this joint distribution then z is normal with mean zero. 
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Finally we can eliminate both mean and variance. Suppose 2, - 
sample from some normal distribution. The variables 


oe ~ i, “ 2 
wm = 4/ tet Lda, m=1,2,---,n—1, 


are normal and independent with mean zero and some variance. Then we have 


the set 
Ir+1 1\ r 1 r+1 \ 
pil (Fa) - a 4 
- if, _1¢,\ - 
4 {ti iE 


independently distributed according to ¢-distributions with r degrees of freedom 
respectively. It may be convenient for computational purposes to make use of 
the identity 


r j 1 7 2 r+1 1 r+l 2 ea } 
iti fee i rp a} = ~ (« = — ) = z. (x; — Fea)”. 


j=l r+1j; j=1 





r=1,---,n-—2, 





















We then have 


1 ” 
as yr ¢ + >) (Lr42 = E(r41)) 


= .3 iat. #<@ 
V bs (a; — rn)” 
t=1 


Now, by Theorem IIIc, we know that if the set {t;} has this specified distribution 
then x must be distributed according to some normal distribution. The set 
{t;} may be used to test the goodness of fit of the observations to normality, 
‘ by first adjusting the set {t,} to a standard basis of comparison, i.e., by con- 
sidering F,(t;), where F, is the corresponding cumulative distribution function 
and then applying, for example, a x” goodness of fit test to these n — 2 quanti- 
tities, with respect to the rectangular distribution on (0, 1). 
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THE SELECTION OF VARIATES FOR USE IN PREDICTION WITH 
SOME COMMENTS ON THE GENERAL PROBLEM OF 
NUISANCE PARAMETERS 


By Haro.tp Hore.LLinG 


1. Maximum Correlation asa Test. For predicting or estimating & particular 
variate y there is frequently available an embarrassingly large number of other 
variates having some correlation with y. For example, in fitting demand 
functions by means of economic time series, the number of series of observations 
having some relation to the demand which is sought to be estimated is apt to be 
very large, whereas the number of good independent observations on each is 
quite small. The proper coefficients in the regression equation must ordinarily 
be determined from the observations, and must not exceed in number the ob- 
servations on each variate. Furthermore, in order to have a measure of error 
that will make it possible to distinguish real effects from those due to chance, 
it is necessary that the number of predictors’ shall be enough less than the 
number of observations on each variate so that the residual chance variance 
can be determined with an appropriate degree of accuracy. It is desirable to 
select a set of predictors yielding estimates of maximum but determinable ac- 
curacy, and at the same time to avoid the fallacies of selection among numerous 
results of that one which appears most significant and treating it as if it were 
the only one examined. 

Considerations other than maximum and determinate accuracy are of prac- 
tical importance. The labor of calculation by the method of least squares 
becomes a serious obstacle to the use of the theoretically optimum set of vari- 
ates when these are very numerous, though the rapid current development of 
mechanical and electrical devices suitable for these computations offers a hope 
that the limits now set in practice in this way will soon be considerably increased. 
Furthermore, predictions or estimates must, as in speculative business or in 
military activity, be made from moment to moment, often in a rough manner 
by persons incapable of or averse to using complex formulae, and in such activi- 
ties frequent revisions of the regression equations must be made to accord with 
altered conditions. Also, in temporal predictions, the time of availability of 


1T use this term for what are often called the independent variates in a regression 
equation, since these ordinarily are not really independent in the probability sense. Simi- 
larly I shall call the “‘dependent”’ variate the predictand. By prediction I mean merely the 
use of regression equations to estimate some unknown variate by means of the values of 
related variates, without any necessary connotation of temporal order, though the most 
interesting applications seem for the most part to be those in which we pass from a know1- 
edge of the past to an estimate of the future. 
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the values of the predictors is important, since an early prediction (e.g. of the 
size of a harvest) is more valuable than a later one of the same accuracy. 

If we make the usual assumption’ that the probability distribution of y is, 
for every set of values of the predictors, normal with a fixed variance o’ and an 
expectation that is a linear function of the predictors, we shall wish to minimize 
o subject to appropriate limitations, and this amounts to the same thing as 
maximizing the multiple correlation p of y with the predictors, since 1 — p’ is 
the ratio of o” to the total variance of y, which is the same for all sets of predictors. 
The estimates s and R of o and p obtained from the available sample are of 
course a different matter. But it is clear that the value of R provides a suitable 
criterion of choice under the following conditions: We are called upon tochoose 
one among two or more sets, each consisting of a fixed number of predictors; 
for each predictor we have a known value corresponding to each of the values 
Yi, --+ , ¥yw Observed for the predictand; and there is no basis for preferring one 
of these sets to another either in theory, in observations extraneous to those just 
specified, or in cost or time of availability. In particular, if just one predictor is 
to be used, that having the highest sample correlation with the predictand should 
under these conditions be the one adopted. But in making such a choice a test 
of its accuracy is required, to take account of the possibility that the wrong 
choice has been made because of chance fluctuations in the sample correlation 
coefficients. 

There are innumerable economic variates available for prediction of 
business conditions, and most of these are highly correlated with each other. 
The selection of one business index instead of another for a_particu- 
lar purpose will involve the question which has exhibited the higher correlation 
with the quantity to be predicted, and consequently the question of the definite- 
ness with which the difference between the calculated correlations can be 
regarded as significant. 

Our problem evidently has a bearing on governmental policy in selecting 
among the numerous series of data those whose continuation will be most valu- 
able. The high cost of assembling these statistics dictates a careful selection of 
a limited number of series having little correlation with each others’ current 
values, but with correlations as great as possible with those things whose predic- 
tion or estimation is most important. 


2. The Choice of one Predictor with Two Available. Jet us take first the 
simplest case, which may be illustrated by a Michigan State College problem of 





2 We shall not here go into the question of the applicability of these standard assump- 
tions to time series otherwise than to note that some transformations of observations 
ordered in time are usually necessary and sufficient to obtain quantities satisfying the 
assumptions so closely that deviations from them cannot be detected. Such transforma- 
tions include replacing a variate by its logarithm, and eliminating trend and seasonal 
variations by least squares. In view of the satisfactory adjusted observations found 
empirically by these and similar methods, the usual objections to studying time series by 
exact methods seem much exaggerated. 
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which Dr. W. D. Baten has told me. The ultimate weight of a mature ox is 
estimated by means of his length at an early age. The question has been raised, 
however, whether a more accurate prediction might not be made by means of 
the calf’s girth at his heart. Records were at hand of 13 oxen showing their 
lengths and girths as calves and also their weights when mature. A regression 
equation involving both length and girth would presumably give greater accuracy 
than either variate alone; but it appears that those who make the estimates 
desire a simple formula involving only one variate. Suppose, then, that insuch 
a sample the correlation of weight with length is 7; = .7, that the correlation 
of weight with girth is re = .5, and that the correlation of girth with length is 
m= .4. Is the difference r, — re = .2 sufficiently great in relation to its sampling 
errors to warrant the inference that girth is really a better predictor than 
length, or must the question be left in abeyance until more observations can be 
accumulated? 

A straightforward procedure which would have been used with little question 
before the advent of modern exact methods is to calculate the asymptotic ap- 
proximation to the standard error of 7; — r2 by the differential method, assuming 
the three variates to have the trivariate normal distribution, and to regard the 
difference of the correlations as significant if it exceeds a multiple of this standard 
error determined by the tables of the normal distribution. The calculation of 
the asymptotic approximation o,,_,, may be carried out in the following manner. 
Let pi, p2, and po be the population values of 7; , v2 , and 7 respectively. Then 
if ;; denote the population covariance of z; and 2;(i, 7 = 0, 1, 2), we have 


001 


-— ay oo 


with similar formulae for pe and po. Likewise the sample estimates of these 
parameters are given by such expressions as 


S01 
ny = . 
V 800811 


Taking the logarithm of this last expression, expanding about the population 
values, denoting by the operator 6 the deviation of sample from population values 
of the covariances, and the resultant deviation in 7, and dropping terms of 
order higher than the first, we have: 


’ ” C01 2000 2o1 ' 
In the same way 


ors — 2000 


The asymptotic value of the sampling covariance is obtained by multiplying 
these two expressions together and taking the expectation. The sampling co- 
variance of two estimates of covariance of the usual kind (sum of products 
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divided by number of degrees of freedom) in the same sample, having n degrees 
of freedom (which ordinarily means that, there are n + 1 individuals in the 
sample and that the means are eliminated), is given exactly by the formula’ 


E (8; j58im) = (cintjim + oimo jz)/N, 


in which the subscripts may have any values, equal or unequal. When this 
formula is applied to each of the nine terms of the product and the results are 
expressed in terms of the correlations p; , there results the asymptotic expression 
for the covariance given by 


nE(5ri6r2) = 4pipe(or + p2 + po — 1) + po(l — pi — 12). 


This method provides also one of the derivations of the familiar formula which 
may be written 


nor, = nE(ér:)" = (1 — pi)’, nar, = (1 — pz)”. 


The variance of the difference of 7; and 72 is the sum of their variances minus 
twice their covariance. Hence 


Noy ,-r, = (1 — pi)’ + (1 — pe)” — prpe(or + p2 + p> — 1) + 2po(oir + ps — 1). 


We are testing the hypothesis that pi = pe. If we put a common value p 
for them in the last expression and simplify, we obtain for the standard error 
of the difference, 





Orr. = / (1 = po)(2 — 3p? + pop?) 
“Ts = - 


The second factor in parentheses is always positive because of the inequalities 
limiting the correlations among three variates. 

This formula contains two unknown parameters, p and po. The classical 
procedure would be substitute r; , re and ro respectively for pi , pe , and po in the 
previous formula, and use the resulting standard error expression as if the ratio 
to it of r; — re were normally distributed. A first modification, more in line with 
modern ideas, would be to use some kind of average of r; and re as an estimate 
of both p; and pz, since the null hypothesis tested is that these are equal. But 
whatever sample estimates we substitute for p and po , the formula remains un- 
satisfactory, since no suitable limits of error are available. If instead of the 
standard error we were to work out the exact distribution of 7; — r2 we should 
still not be free from the difficulty. This exact distribution clearly involves 
both p and po, since its variance does so. Neither can we escape from the 
trouble by using some function z = f(r), such as the inverse hyperbolic tangent 
suggested by R. A. Fisher, and considering the standard error of z: — z. = 





3 T have given a derivation of this furmula from the characteristic function of the multi- 
variate normal distribution [1]. Numerous special cases appear in earlier literature. The 
derivation above is a simplification and improvement of several versions, appearing in 
the various early writings of Karl Pearson. 
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firs) — f(r2); for this standard error will have as the first term in its expansion 
in a series of powers of n‘ simply the product of the expression above for 
or,-r, by f’(p); and this must clearly involve both pp and p. 


3. Nuisance Parameters. This is not by any means the only statistical prob- 
lem in which unknown and undesired parameters enter into the distribution of 
the statistic which we should naturally use to test a hypothesis. Indeed, the 
early investigation which was perhaps most influential in setting the whole tone 
of modern statistical research was that [2] in which W. C. Gosset (“‘Student’’) 
arrived at the exact distribution of the ratio of a deviation in the mean to the 
estimated standard error. The previous practice (which unfortunately survives 
today in some quarters, and is even taught to students without explaining its 
approximate character) was to neglect the sampling errors in the estimate of 
the unknown variance o° and to treat the ratio as normally distributed with 
unit variance. The rigorous derivation by Fisher [3] of the Student distribution 
makes clear the manner in which the nuisance parameter 3 may in this, and in 
some other, problems be eradicated from the distribution through integration, 
after altering the original statistic (the deviation in the mean) by dividing it 
by another statistic. The new statistic, the Student ratio, vanishes whenever 
the old statistic, the deviation in the mean, does so, and the same hypothesis 
is tested by both. This then is one way to get rid of a nuisance parameter: 
when you have a statistic estimating a parameter whose vanishing is in question, 
but whose distribution involves another parameter, alter the statistic by multi- 
plying or dividing by another statistic in such a way that the new function 
vanishes whenever the old one does so; and do this in such a way that the new 
distribution will be independent of the nuisance parameter. Unhappily, this 
method has been applied successfully only in particular cases, and no way to 
use it in the problem at hand has been found. 

A second method is that of transformation employed by Fisher in dealing with 
such problems as testing the significance of the difference between the correla- 
tion coefficients in independent samples between the same two variates. The 
need for the transformation in this case is occasioned by the presence in the 
distribution of the dimerence of the sample correlations of the unknown true 
value, which is not directly relevant to the comparison. We have seen that 
this method also fails to solve our problem. 

A third method of dealing with nuisance parameters is the use of fiducial 
probability by R. A. Fisher [4] and by Daisy M. Starkey [5] in testing the 
significance of the difference between the means of two samples when the 
variances may be unequal. Criticisms of these applications of fiducial probability 
have been made by M. S. Bartlett [6] and B. L. Welch [7], and the field of 
applicability of such methods is still in need of elucidation. 

Some findings of J. Neyman [8] having a bearing on the general nuisance 
parameter problem should also be noted. 

The only other class of methods for dealing with nuisance parameters of which 
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I am aware involves the comparison of the particular sample obtained, not with 
the whole population of samples with which a comparison might be made if we 
knew the value of the troublesome parameter, but with a sub-population selected 
with reference to the sample in such a way that the distribution, in this sub- 
population, of the statistic used does not involve any unknown parameter. An 
example is the testing of significance of a regression coefficient. Thus if we 
suppose that a sample of values of x and y is drawn from a bivariate normal 
population, and calculate the regression coefficient b of y on z in the sample, 
the distribution of b involves not only the population value 8, but also the ratio 
a of the variances in the population. Since this second parameter is unknown, 
and can only be estimated from the sample, it is not possible to use the distribu- 
tion of b in the whole population directly to test the significance of b — 8, 
What we do is to find the place of this difference, not in the whole population 
of values in which both z and y are drawn at random, but in a sub-population 
for which the values of xz are the same as in our sample. We may alternatively 
say that we limit the sub-population only to that for which the sum of the 
squares of the deviations of the values of x from their mean is the same as in 
our sample; the results are the same. The distribution in this sub-population 
of the ratio of b — 6 to its estimated standard error is of the Student form, with 
no unknown parameters, and on this basis it is possible to make exact and 
satisfactory tests and to set up fiducial limits for b. Another example is that 
of contingency tables. The practice now accepted (after a controversy) for 
testing independence of two modes of classification, such as classification 
of persons according as they have or have not been vaccinated, and again ac- 
cording as they live through an epidemic or die, is to compare the observed 
contingency table, not with all possible contingency tables of the same numbers 
of rows and columns, but only with the possible contingency tables having 
exactly the same marginal totals as the observed table. 


4. An Exact Solution. We shall solve the problem of the significance of the 
difference of 7; and re with the understanding that the meaning of significance 
is to be interpreted by reference to the sub-population of possible samples for 
which the predictors x; and z_ have the same set of values as those observed in 
the particular sample available. This procedure, besides yielding an exact 
distribution without unknown parameters, has the advantage of relaxing the 
stringency of the requirement Of a trivariate normal distribution. We now make 
only the assumptions customary in the method of least squares, that the pre- 
dictand y has the univariate normal distribution for each set of values of x; and 
t2, independently for the different sets, with a common variance o’, and with 
the expectation of y for a fixed pair of values of the predictors a linear function 
of these predictors. No assumption is involved regarding the distribution of 
the predictors, since we regard them as fixed in all the samples with which we 
compare our particular sample. The advantages of exactness and of freedom 
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from the somewhat special trivariate normal assumption are attained at the 
expense of sacrificing the precise applicability of the results to other sets of 
values of the predictors. 

Since the correlational properties are unchanged by additive and multiplica- 
tive constants, we may suppose that 


(1) Sa —— | S22 , Sx? =l= Sai , 














where S stands for summation over a sample of N individuals. The notation 
may be made more explicit by the adjunction of an additional subscript a, vary- 
ing from 1 to N, to denote the individual member of the sample, so‘that instead 
of Sz:, for example, we might write Sz,.. The omission of this additional 
subscript is convenient and will usually leave no ambiguity when we deal with 
sums, but it will be convenient to retain it in connection with individual values. 
The correlation 7 of 2; with z2 in all those samples we shall consider is, by (1) 


T = S222 ° 


Now consider the new quantities 





(2) fas = = gl! = Tat Ta 
. V/2(1 — To) F ” V 2(1 + To) ‘ 

Evidently, from (1) and (2), 

(3) Sz’ = 0 = Sz”, Sa” = 1 = Sz’, Sz'z" = 0. 


Since the mean value E(yq) is a linear function of 212 and Xea, Ya MAY, UPON 
subtracting a constant from all these expectations, be written 


(4) 


where A, , --- , Ay are normally and independently distributed with variances 
all equal to o” and expectations zero. The assumption that 2, and 22 are equally 
correlated with y in the population leads to the conclusion that 8; = B ; and 
putting 8 = 6:»/2(1 + 7), we then have from (4) and (2): 

(5) Ya = Bta + de. 


Consequently, by (3) 














Ya _ Bitia + Boloa + Aa ’ 





Sa'y = Sxraya = BSz'x" + Sz'A = Sz’; 


: ° . . . . ° 2 
and this function has a normal distribution with zero mean and variance o. 
If in the sample we work out a regression equation 


Y=a-+t b's’ + bz", 
the normal equations for determining b’ and b’’ must by (3) take the simple forms 


a= 4%, b’ = Sz’y, b” = Sax'’y. 
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From the general theory of least squares it is known that the sum of squares 
of residuals is 


Sv’ = S(y — Y)’ = Sy’ — gSy — (Sz’y)* — S(x"’y)’, 


and that Sv’/o” has the x’ distribution with n = N — 3 degrees of freedom, 
independently both of Sz’y and of Sz’’y. From these facts it follows that 


(6) t = Szx'y - 
has the Student distribution with » degrees of freedom. Since in accordance 
with the foregoing definitions and (1) we have 


Sz'y = (rn 
and since also it is known that 
So’ = Sy — 9)’ 7 
jl rm 72} 
D =| 1 1 To|, 


| 

| 

ir 

| 12 To 1| 


(6) may be written 


(7) t= (n—n4/M tnd 


The probability of a greater value of |¢| is given by tables of the Student 
distribution with n = N — 3. If this probability is sufficiently small (which 
conventionally means less than .05, or sometimes .01) we have a corresponding 
degree of confidence that the variate chosen because of a higher correlation in 
the sample has actually a higher correlation than the other in the population. 


5. The Selection of One Variate from Among Three or More. Suppose that 
we are to choose one of the variates 2; , --- , £»in order to predict y. (p < N — 1) 
We choose the one having highest correlation, and wonder how much confidence 
to place in this choice. We shall now determine the distribution of a function 
suitable for testing the hypothesis that there is no real difference between any 
pair of the correlations of x; , --- , 2» with y. Again we shall assume the values 
of these predictors fixed, and look for the place of our particular sample among 
all samples having these values, with only y free to vary normally by chance. 

Let ai; = S(x; — &,)(a; — &;), and let c;; be the cofactor of a;; in the deter- 
minant a of these quantities, divided by a. Then 
(8) i AjCx = OR = |} ne " 

10 if7 #k. 
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Here > stands for summation from 1 to p. Let 
Ee 
LD 
(10) l; = S(a; — Z)y, 
(11) lL = Lwjl;. 
From (9) it follows that 
(12) zu; = 1. 


(9) Wi 


From the hypothesis that y is in the population equally correlated with all the 
z; it follows that l,, --- , lp have equal expectations, which we may denote by 
\; and from (11) and (12) it follows that also E(l) = ». Obviously 


(13) E(l; — d)(l; — d) = o°ai;, 
whére o° is the variance of those values of y corresponding to a fixed set of 


values of the x’s. From (11), (13) and (9) we obtain 


2 
-_ Co 
(14) E(l — \)° = 530," 


Since the /; are linear functions of the y’s, they have the multivariate normal 
distribution. From the theory of this distribution and the values (13) of the 
covariances it follows that the distribution has the form 


(Qn) a 46% TI" dl, --- dlp, 
where @ is the determinant of the a;;’s, and 


T= rrc;;(l; = A) (1; = d). 


We may introduce linear functions l; , ---, 1, of l, — 4, --- , l» — A such that 


2 12 
T =U +... +13, and such that U2 = (1 —)*B2e,;. Now t+ ~~~ fos 


oe 


has the x’ distribution with p — 1 degrees of freedom. The numerator of this 
expression equals 


T — Ip = 22e;;(l; — AL; — A) — (LU — A) EEe; 
’ Lr; ,l il; = Pr re; 
Zre(l — Yl; — I. 


The penultimate form shows that this function is independent of \; the last, 
as a positive definite form in the deviations of the l’s from their weighted mean, 
shows that sufficiently large values of the expression will reveal with definiteness 
the inequality of the predicting powers of the p variates when this exists. 
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It is well known that the regression coefficients of y upon the set of variates 
%1,-++-, Zp are completely independent of thé sum of squares Sv’ of residuals 
from the regression equation. Since the l’s are linear functions of these regres- 
sion coefficients, (namely the linear functions appearing in the normal equa- 
tions), they also are independent of Sv’. Hence, if we put 


2 2re;;1;1; — I? 226i; 
81 Sl 


ol 
dw Sv’ 
2 N-p-1’ 


the ratio F = s{/s} will, in case of equality of the correlations of the various 
x’s with y, have the variance ratio distribution with mn, = p — 1 and m, = N — 
p — 1 degrees of freedom. When p = 2 this test reduces exactly to (7), as it 
should, and F = ¢’. 

In the numerical application of this method, the regression coefficients }; 
of y on 2%, --- , 2» should first be worked out by the inverse matrix method. 
The right-hand members of the normal equations are |, , --- , 1, , the coefficients 
in these equations are the a;;, and the calculation of s{ is simplified with the 
help of the identity 


Tre; lil; > Zbil; ° 


6. Selection of Additional Variates When Some Have Been Chosen. Sup- 
pose now that q predictors have been included definitely in the regression equa- 
tion, and that one more is to be selected for inclusion among p additional pre- 
dictors that are available. The criterion now is that that one should be chosen 
tentatively which has the highest partial correlation with the predictand, elimi- 
nating those already definitely chosen; but the confidence to be placed in the 
choice is to be judged by an adaptation of the criterion of the preceding section. 
It is only necessary to consider the a;;, 1; , cx; and b; (7,7 = 1, --- , p) as cal- 
culated from the new predictors and the deviations of y from the regression 
equation on the predictors already adopted. Formulae may easily be derived 
for the values of these quantities in terms of those already found and the sums 
of products, so as to simplify the calculations. Sv’ will now stand for the sum 
of squares of residuals from the regression equation involving all the p + q 
predictors. It is to be divided by N — p — q — 1 to obtain s;. The numbers 
of degrees of freedom with respect to which F is to be judged are now n; = p — 1 
and m = N—p-—gq-—1. When p = 2 this test, like that of the preceding 
section, reduces to the use of the ¢t-distribution of (7), with n = N — q — 3, 
and the correlations standing for partial correlations eliminating the predictors 
already definitely chosen. 

A special instance in which this procedure is applicable is in economic time 
series, in which time, in the form of orthogonal polynomials, must ordinarily be 
‘‘partialled out” in order that tests of significance may be sound. 
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7. Further Problems. It is natural to ask whether the foregoing work can be 
extended to examine the soundness of the selection, on the basis of a greater 
multiple correlation, of a particular set of two or more variates, chosen from 
among several such sets. The simplest such problem that goes beyond what 
has been done above deals with two sets, each of two predictors, having in a 
sample multiple correlations R and R’ with the predictand. The question is 
whether the difference R — R’ is significant. 

Suppose that, in the interests of simplicity and the hope of attaining a solu- 
tion satisfactorily free from unknown parameters, we assume as before that the 
predictors have a fixed set of values, the same in all samples. Since multiple 
correlations are invariant under linear transformations of predictors, we may 
without loss of generality assume that the predictors in each set are mutually 
uncorrelated and have sums of squares equal to unity. Indeed, we may go 
somewhat further in standardizing the sets of values to which consideration can 
be confined without loss of generality, with the help of some ideas introduced 
in the paper [1]. In the terminology of that paper, the variates in each set may 
be considered canonical with respect to the relationship between the sets. This 
means that linear functions z, and 2, of the two variates in one set, and linear 
functions x; and 2, of those in the other set, can be chosen so as to satisfy not 
only the conditions 


Sa, = Sa, = Sx, = Sx, = 0 
(15) Sai = Sx} = Sz’ = Sx)? = 1 
S272 = 0 = Saiz , 
but also the further conditions 
(16) Sayzy = O = Saari. 


This means that, for all the purposes in view, the two sets of predictors can be 
characterized as to their mutual relationships by the values of the remaining 
two sums of products, namely 


/ ’ 
Cc, = S22, Co = S2Xeko. 


In view of the conditions assumed earlier, c; and c2 are what have been called 
the canonical correlations between the two sets. 

To the sets thus standardized, the predictand y is related in a manner expressed 
by the population regression coefficients 8; and Be of y on the first set, and Bi 
and B; on the second. If we take y as having unit variance in the population, 
the squared multiple correlation coefficients in the two cases will be 


p =Bi+B:, p” = Br + Br. 
The hypothesis to be tested is that p = p’. If b,, be, b; , bz denote the sample 


estimates of the regression coefficients, the statistic appropriate for the test 
would appear necessarily to be proportional to 


w = 1(b? + bs — b,? — b,”). 
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The sample regression coefficients are normally distributed, with population 
correlations equal to the sample correlations among the corresponding predictors, 
The variance of each is o. Thus their joint distribution may be written down 
at once, in a rather simple form in view of (15) and (16). From this it is pos- 
sible to determine directly the characteristic function M(t) = Ee™ of w. If 
we write K(t) = log M(t) we obtain: 


2K(t) = 2{(6; — 2c,8,8; + BF) + (BF — BY )t} {1 — (1 — ef) P37 
— D log {1 — (1 — c§)t}. 


Here the summations are with respect to j over the values 1 and 2. If each set 
of predictors had had s members, the same result would hold for K(t) except 
that the summations with respect to 7 would then extend from 1 to s. 

This is a very disappointing result because it contains so many parameters. 
The distribution of w must contain the same parameters as its characteristic 
function. All the four parameters 8; , 8; appear in the expression above, though 
their effective number is reduced to three by the condition that the two sums 
of squares shall be equal which constitutes the hypothesis under test. The 
distribution of w thus contains at least three unknown parameters besides «o. 

The estimate of variance s° obtained from the residuals from the grand re- 
gression equation of y on 2, 22, az; , and 22 is independent of w. Its distribu- 
tion is of the usual form and involves a parameter, the population variance, 
which is a function of 8; , B2, 8; , and B2. We could therefore pass by a single 
integration from the distribution of w to that of the statistic w/s’, which vanishes 
with w, and which on this account, and on grounds of physical dimensionality, 
might be considered appropriate to test the hypothesis that p = p’. The ques- 
tion may be raised whether the distribution of this ratio might not be free from 
parameters. The answer unfortunately is in the negative, as appears from an 
examination of the characteristic function of the ratio. Even in the simplified 
case in which all the c; are equal, a troublesome parameter persists in the 
distribution. 

Thus we meet again the problem of nuisance parameters, and this time no 
escape is visible. Perhaps some such artifice as those enumerated in paragraph 
3 (for example, some further limitation of the sub-population within which we 
should seek the place of our particular sample) is capable of vielding an exact, 
or “‘studentized”’ distribution, but this has not yet been found. The problem 
is of considerable interest, not only because of its practical importance, but 
because of its suggestiveness in connection with general theory. 

Numerous other problems having both practical importance and general 
theoretical interest are associated with the selection of predictors. For example, 
we have not dealt at all with the problem of the number of predictors that 
should be used when maximum accuracy in prediction, or in evaluation of the 
regression coefficients, is the sole criterion. A particular case is the determina- 
tion of the degree of the regression polynomial which should be fitted to obtain 
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maximum accuracy, for example of the number of orthogonal polynomials in 
fitting a trend. Such customary criteria as minimizing the estimated variance 
of deviations, in which the sum of squares which is the numerator and the 
number of degrees of freedom which is the denominator both diminish to zero 
as the number of variates is increased, do not rest upon any satisfactory general 
theory. 

Another related set of problems is concerned with variates more numerous 
than the observations on each. It is clear that there is real information in- 
herent in data of this kind, but existing theory and methods, including those of 
the present paper, are not adequate to utilize it in a thoroughly efficient manner. 
A recent paper of P. L. Hsu [9] is unique in not excluding the case in which the 
variates outnumber the observations. 


8. Summary. A criterion has been obtained for judging the definiteness of 
the selection of a particular variate, from among several available for prediction, 
on the basis of its having the maximum sample correlation with the predictand. 
A variation of this criterion is applied in paragraph 6 to the problem of extending 
the list of variates to be used in a regression formula. 

Some of the problems of “nuisance parameters” which affect general theory 
are illustrated in this problem. Some outstanding unsolved problems related 
to these questions are discussed in paragraph 7. 
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THE FITTING OF STRAIGHT LINES IF BOTH VARIABLES ARE 
SUBJECT TO ERROR 


By ABRAHAM WALD 





1. Introduction. The problem of fitting straight lines if both variables z 
and y are subject to error, has been treated by many authors. If wehave N > 2 
observed points (x; , yi) (¢ = 1, --- , N), the usually employed method of least 
squares for determining the coefficients a, b, of the straight line y = ax + b 
is that of choosing values of a and b which minimize the sum of the squares of 
the residuals of the y’s, i.e. Z(axz; + b — y,)’ isa minimum. It is well known 
that treating y as an independent variable and minimizing the sum of the 
squares of the residuals of the x’s, we get a different straight line as best fit. It 
has been pointed out’ that if both variables are subject to error there is no 
reason to prefer one of the regression lines described above to the other. For 
obtaining the “best fit,’’ which is not necessarily equal to one of the two lines 
mentioned, new criteria have to be found. This problem was treated by R. J. 
Adcock as early as 1877.” 

He defines the line of best fit as the one for which the sum of the squares of 
the normal deviates of the N observed points from the line becomes a minimum. 
(Another early attempt to solve this problem by minimizing the sum of squares 
of the normal deviates was made by Karl Pearson.*) 

Many objections can be raised against this method. First, there is no justifi- 
cation for minimizing the sum of the squares of the normal deviates, and not 
the deviations in some other direction. Second, the straight line obtained by 
that method is not invariant under transformation of the coordinate system. 
It is clear that a satisfactory method should give results which do not depend 
on the choice of a particular coordinate system. This point has been empha- 
sized by C. F. Roos. He gives* a good summary of the different methods and 
then proposes a general formula for fitting lines (and planes in case of more than 
two variables) which do not depend on the choice of the coordinate system. 


1 See for instance Henry Schultz’ ‘‘The Statistical Law of Demand,’ Jour. of Political 
Economy, Vol. 33, Dec. (1925). 

2 Analyst, Vol. IV, p. 183 and Vol. V, p. 53. 

3 “On Lines and Planes of Closest Fit to Systems of Points in Space’’ Phil. Mag. 6th 
Ser. Vol. II (1901). 

4“*A4 General Invariant Criterion of Fit for Lines and Planes where all Variates are 
Subject to Error,’’ Metron, February 1937. See also Oppenheim and Roos Bulletin of the 
American Mathematical Society, Vol. 34 (1928), pp. 140-141. 
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Roos’ formula includes many previous solutions’ as special cases. H. E. Jones® 
gives an interesting geometric interpretation of Roos’ general formula. 

It is a common feature of Roos’ general formula and of all other methods 
proposed in recent years that the fitted straight line cannot be determined 
without @ priori assumptions (independent of the observations) regarding the 
weights of the errors in the variables z andy. That is to say, either the standard 
deviations of the errors in x and in y are involved (or at least their ratio is 
included) in the formula of the fitted straight line and there is no method given 
by which those standard deviations can be estimated by means of the observed 
values of x and y. 

R. Frisch’ has developed a new general theory of linear regression analysis, 
when all variables are subject to error. His very interesting theory employs 
quite new methods and is not based on probability concepts. Also on the basis 
of Frisch’s discussion it seems that there is no way of determining the “‘true’’ 
regression without a priori assumptions about the disturbing intensities. 

T. Koopmans’ combined Frisch’s regression theory with the classical one in 
a new general theory based on probability concepts. Also, according to his 
theory, the regression line can be determined only if the ratio of the standard 
deviations of the errors is known. 

In a recent paper R. G. D. Allen’ gives a new interesting method for deter- 
mining the fitted straight line in case of two variables x and y. Denoting by oc. 
the standard deviation of the errors in 2, by o, the standard deviation of the 
errors in y and by p the correlation coefficient between the errors in the two 
variables, Allen emphasizes (p. 194)° that the fitted line can be determined only 
if the values of two of the three quantities o, , 7, , p are given @ priort. 

Finally I should like to mention a paper by C. Eisenhart,’ which contains 
many interesting remarks related to the subject treated here. 

In the present paper I shall deal with the case of two variables x and y in 
which the errors are uncorrelated. It will be shown that under certain con- 
ditions: 

(1) The fitted straight line can be determined without making a priori assump- 
tions (independent of the observed values x and y) regarding the standard 
deviations of the errors. 

(2) The standard deviation of the errors can be well estimated by means of 


5 For instance also Corrado Gini’s method described in his paper, ‘‘Sull’ Interpolazione 
di una Retta Quando i Valori della Variable Independente sono Affecti da Errori Acciden- 
talis,’’ Metron, Vol. I, No. 3 (1921), pp. 63-82. 

6 “Some Geometrical Considerations in the General Theory of Fitting Lines and Planes,”’ 
Metron, February 1937. 

7 Statistical Confluence Analysis by Means of Complete Regression Systems, Oslo, 1934. 

8 Linear Regression Analysis of Economic Time Series, Haarlem, 1937. 

*“The Assumptions of Linear Regression,’’ Economica, May 1939. 

« “The interpretation of certain regression methods and their use in biological and 
industrial research,’’ Annals of Math. Stat., Vol. 10 (1939), pp. 162-186. 
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the observed values of x and y. The precision of the estimate increases with 
the number of the observations and would give the exact values if the number 
of observations were infinite. (See in this connection also condition V in 
section 3.) 


2. Formulation of the Problem. Let us begin with a precise formulation of 
the problem. We consider two sets of random variables! 


M1,-++,Un; »°°t Yn. 


Denote the expected value E(x;) of zx; by X; and the expected value E(y;) of 
y; by Y; @ = 1,---,N). We shall call X; the true value of 2; , Y; the true 
value of y; , 2; — X; = e; the error in the 7-th term of the x-set, and y; — Y; = 7; 
the error in the 2-th term of the y-set. 

The following assumptions will be made: 

I. The random variables « , --- , €~ each have the same distribution and they 
are uncorrelated, i.e. E(exe;) = 0 fori # j. The variance of «; is finite. 

II. The random variables m , --- , nw each have the same distribution and are 
uncorrelated, i.e. E(nin;) = 0 fort # 7. The variance of n; is finite. 

III. The random variables ¢; and n; (¢ = 1,---,N;j = 1,---,N) are un- 
correlated, i.e. E(e:n;) = 0. 

IV. A single linear relation holds between the true values X and Y, that is to 
say Y; = aX; +B(i =1,---,N). 

Denote by ¢ a random variable having the same probability distribution as 
possessed by each of the random variables «, --- ,¢y, and by 7 a random 
variable having the same distribution as m,--- , 7n. 

The problem to be solved can be formulated as follows: 

We know only two sets of observations: Zi,+++,2n5Y1, +++, Yn, where 2; 
denotes the observed value of x; and y; denotes the observed value of y;. We 
know neither the true values X,, --- , Xw; Y1,---,Yw, nor the coefficients 
a and £6 of the linear relation between them. We have to estimate by means 
of the observations 2; , --- , Zv ; Yi, --- , yw, (1) the values of « and 8, (2) the 
standard deviation o, of «, and (3) the standard deviation o, of 7. 

Problems of this kind occur often in Economics, where we are dealing with 
time series. For example, denote by 2; the price of a certain good G in the 
period t; , and by y; the quantity of G demanded in t;. In each time period f; 
there exists a normal price X; and a normal demand Y; which would obtain if 
the influence of some accidental disturbances could be eliminated. If we have 
reason to assume that there exists between the normal price and the normal 
demand a linear relationship we have to deal with a problem of the kind de 
scribed above. 

In the following discussions we shall use the notations z; and y; also for their 


11 A random or stochastic variable is a real variable associated with a probability 
distribution. 
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/ / ° . ° . . 
observed values z; and y; since it will be clear in which sense they are meant 
and no confusion can arise. 


3. Consistent Estimates of the Parameters a, 8, o., o,. For the sake of 
simplicity we assume that N is even. We consider the expression 


_ (tit «++ + tm) — (Smart e+ + ty), 
Q = eee NW eee 


— Yr tisss + Ym) — Ym + +++ + yw) 
N ? 


where m = N/2. As an estimate of a we shall use the expression 


(2) Qe _ (yn + +++ + Ym) — mer + +++ + yw) 


a= — 


a 7 (x1 ie Sia} = (Lm41 + eee + rn) 
We make the assumption 
V. The limit inferior of 

(Xap oes + Xm) — (Kar + +++ + Xw) 

N | 


(1) 


ae 





(N = 2,3, --- ad. inf. 


is positive. 

We shall prove that a is a consistent estimate of a, i.e. a converges stochas- 
tically to a with N — o, if the assumptions J-V hold. Denote the expected 
value of a; by a; and the expected value of a2 by ad. It is obvious that 


— (Kit +++ + Xm) — (Xmyi + +++ + Xw) 
N , 


— _— (Yi t-++ + Ym) — (Ym + +++ + Yw) 
a ee eens 


ay 





(3) 





On account of the condition IV we have 
(4) @2 = adi, or = a. 


The variance of a, — 4; is equal to o</N and the variance of a; — dz is equal 
too, /N. Hence a; and a; converge stochastically towards a; and ds respectively. 


From that and assumption V it follows that also ~ converges stochastically 
1 


towards z =a. The intercept 8 of the regression line will be estimated by 
1 


t+ +++ +n 


(5) b=g-— at, where? = ———,>———_ and gu fh 2 Te. 


N 


Denote by X the arithmetic mean of Xi, --- , Xy and by Y the arithmetic 
mean of Y,,---, Yn». Since g converges stochastically towards Y, < towards 








288 ABRAHAM WALD 





X, and a towards a, b converges stochastically towards Y — aX. From condi- 


tion IV it follows that Y — aX = 8. Hence b converges stochastically 
towards Bp. 
Let us introduce the following notations: 


ae = 


a 
s, = y/ = (ys N my)” _ sample standard deviation of the y-observations, 


. (% — 2)? ni 
/2 (2 ) sample standard deviation of the z-observations, 


“A 
| 


Sy = = “— sa — 9) = sample covariance between the z-set and y-set. 







Sx, Sy and sxy denote the same expressions of the true values X,, --- 
Fig *** 4 Os 
It is obvious that 


ant 


















(6) E(s2) = sx + a. 


N-1 
sy + 0, N ? 





(7) E(s}) 


(8) E(Sszy) = 8xr, 


where E(sz), E(s;), and E(s.,) denote the expected values of st, s,, and s,,.” 
Since Y; = aX; + B, we have 


(9) Sy = a&x, 
(10) Sxy = as; 5 


From (8), (9) and (10) we get 





(11) 2 _ Elsa) 


8x = 
Qa 


(12) sy = aE (sz). 





If we substitute in (6) and (7) for sy and sy their values in (11) and (12), 
we get 
(13) ot = | (6!) - Eco) | wen — 9), 
a 
(14) o, = [E(s;) — aH(s)|N/(N — 1). 








12 | observe that the equations (6), (7) and (8) are essentially the same as those investi- 
gated by R. Frisch, Statistical Confluence Analysis pp. 51-52. See also Allen’s equations 
(4) l.c. p. 194. 
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Since sz, 8, , Sz converge stochastically towards their expected values and a 
converges stochastically towards a, the expressions 


(15) | - ea |) -i 
and 


(16) [s} — asy]N/(N — 1) 


e ° 2 2 ° 
are consistent estimates of o, and a, respectively. 


4. Confidence Interval for a. In this section, as well as in sections 5 and 6, 
only the assumptions I-IV are assumed to hold. In other words, all state- 
ments made in these sections are valid independently of Assumption V, except 
where the contrary is explicitly stated. 

Let us introduce the following notation: 


an fen 
ie (x; — %)? + Ze. (x; — 2)” 


‘\2 i=l ves _ 
(sz) = v 


;— Hi) + ie (y; — G2)” 


N 








:- H)(Ys — Hi) + oe. (x; — Z2)(y; — Ge) 
i ; * 





X,, X2, ¥1, Ye, (sx)’, (sy)? and sxy denote the same functions of the true 
values X,,---,Xw, Yi,--:,¥w. The expressions s,, s,, and sy are 
slightly different from the corresponding expressions s,, s,, and s,,. The 
reason for introducing these new expressions is that the distributions of s,, 


Sy, and s,, are not independent of the slope a = = of the sample regression 
1 


line, but s,, s, and s,, are distributed independently from a (assuming that 
and 7 are normally distributed). The latter statement follows easily from the 


fact that according to (1) and (2) a = _—. and s., 8, 8 are distributed 
a. 2 


independently of z, , Z:, 9, and ge. 
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In the same way as we derived (13) and (14), we get 
(13" ot = | (en? — BO") | jw — 2), 
Qa 


(14’) oy = [E(s,)” — aE(sy,)IN/(N — 2). 


These formulae differ from the corresponding formulae (13) and (14) only in 
the denominator of the second factor, having there N — 2 instead of N — 1. 
This is due to the fact that the estimates s, , sy , sz, are based on N — 1 degrees 
of freedom whereas s, , s, and s,, are based only on N — 2 degrees of freedom. 
From (13’) and (14’) we get the following estimates” for o% and o; : 








(17) | (en? — | wyav — 9, 
(18) (sf)? — assy] /( — 2). 


: 2 S 2 : 
Hence we get as an estimate of o, + a’c,. the expression: 
" 


s’ = [(sy)” + a'(sz)” — 2asyJN/(N — 2) 
















(19) : [(ys — ani) — (G1 — ats)? + Ze. [(y; — aa;) — (G2 — afe)) 


N-—2 . N 








Now we shall show that 


(20) 
has the x’-distribution with N — 2 degrees of freedom, provided that ¢ and 4 
are normally distributed. In fact, 


(yi; — ati) — (Hi — a%) = ni — oe; — (M1 — ae) (¢ = 1, ---,m) 


and 





(yi; — a&;) — (Je — ak) = nj — ae; — (f2 —a&) (fj =m+1,--- ,N), 


where 








_ a tere ten _ €mzi + +++ +n 
°°? FS ia a coc 
m m ? 


_mt-r>s +m _ Nm41 + +++ +N 
er ee et ee 
m m 





Since the variance of m, — ae is equal to o; + a’o~ and since m — ag is un- 
correlated with m: — ae: (k ¥ l) (k,l = 1, --- , N), the expression (20) has the 
x’-distribution with N — 2 degrees of freedom. 


13 An “estimate” is usually a function of the observations not involving any unknown 
parameters. We designate here as estimates also some functions involving the parameter a. 
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Now we shall show that 


(21) Ve si- 2 

Vo2 + ao? 
is normally distributed with zero mean and unit variance. In fact from the 
equations (1)—(4) it follows that 


ai(a — a) a2 +2 - - a (# ‘) 
2 = — tk _— €1 — €2 Ge 
a+ %=* andl +%54)(2) 


oo SD se gg ce 
2 2 
Since the latter expression is normally distributed (provided that « and 7 are 


e265 


normally distributed) with zero mean and variance , our statement 


about (21) is proved. 
Obviously (20) and (21) are independently distributed, hence »/N — 2 times 
the ratio of (21) to the square root of (20), namely, 
(22) t=/N—2 VN ai(a = @) _ ae 2-2... 
VN -— 2 - 36° Vs?) + a(s,)° — 2aszy 
has the Student distribution with N — 2 degrees of freedom. Denote by t the 
critical value of t corresponding to a chosen probability level. The deviation 
of a from an assumed population value a is significant if 
| ai(a — a) VN — 3 —2 | 
iV (8)? + Qa 2(s!)? me ‘tne 


The confidence interval for a can be obtained by solving the equation in a, 





to. 





(23) ai(a — «)* = [(s,)* + a°(6%)* — asi] x & 


Now we shall show that if the relation 


4\2 42 
z) t 


holds, the roots a; and az are real and a is contained in the interior of the interval 
[a;a2]. From (19) it follows that 


(s,)° + a(s.)? — 2082 >0 
for all values of a. Hence, for a = a the left hand side of (23) is smaller than 
the right hand side. On account of (24) there exists a value a’ > a and a 
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value a’”’ < a such that the left hand side of (23) is greater than the right hand 
side fora = a’ anda = a’”’. Hence one root must lie between a and a’ and the 
other root between a” and a. This proves our statement. The relation (24) 
always holds for sufficiently large N if Assumption V is fulfilled. The confi- 
dence interval of a is the interval [a; , a2]. For very small N (24) may not hold. 
Finally I should like to remark that no essentially better estimate of the 
variance of 7 — ae can be given than the expression s’ in (19). In fact, we 
have 2N observations %1,---,%wj;¥y1,-::,Yyn. For the estimation of the 
variance of 7 — ae we must eliminate the unknowns X,, --- , Xv and. (The 
unknowns Y,, --- , Yw are determined by the relations Y; = aX; + 8 and ais 
involved in the expression whose variance is to be determined.) Hence we have 
at most N — 1 degrees of freedom and the estimate in (19) is based on N — 2 
degrees of freedom. 


5. Confidence Interval for 8 if a is Given. 
is given by the expression: 






In this case the best estimate of B 


Xi + --- +24y _ 
ie and g = BO 


ba =  — aX where < = 








We have 
b. — 8 = (§ — Y) — a(& — X) = 4 — a8 


where 





+e ite 


Yale 


+ aN 
N . 











Hence, 


/N (ba cn B) 
25) gga SREENSESERgGNEENA 
( Vo? + aa 
is normally distributed with zero mean and unit variance. It is obvious that 
the expressions (20) and (25) are independently distributed. Hence »/N — 2 
times the ratio of (25) to the square root of (20), i.e. 
. = /N — 2 VN (ba as B) —— /N — 2b. — 8) - 


4/N —2s 


has the Student distribution with N — 2 degrees of freedom. Denoting by 
the critical value of t according to the chosen probability level, the confidence 
interval for 6 is given by the interval: 


a ht i. 
[. 4 Vis)? + aX(se)? = Qawtiy 5 


Visi)? + alle)? — Yas 





~ VN=2 


>, — Visi)? + (st)? = ashy » |, 
: /N—2 





























sa 


@n8e fad ~~ ra) bee eed 
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6. Confidence Region for a and 8 Jointly. In most practical cases we want to 
know confidence limits for a and 8 jointly. A pair of values a, 8 can be repre- 
sented in the plane by the point with the coordinates a, 8. A region R of this 
plane is called confidence region of the true point (a, 8) corresponding to the 
probability level P if the following two conditions are fulfilled. 

(1) The region FR is a function of the observations 1, --- ,2w ;Yi,--+,Ywn, 
ie. it is uniquely determined by the observations. 

(2) Before performing the experiment the probability that we shall obtain 
observed values such that (a, 8) will be contained in R, is exactly equal to P. 
P is usually chosen to be equal to .95 or .99. 

We have shown that the expressions (21) and (25), i.e. 


VN ala-—a) VSN (ba — 8) 

ean « ee 

Vo? + aot Vo? + ao? 
are normally distributed with zero mean and unit variance. Now we shall 
show that these two quantities are independently distributed. For this purpose 
we have only to show that Z, 7, a; and a2 are independently distributed (a; and az 
are defined in (1)), but since 


a, — E(a) = (4 — &)/2 
a2 — E(a,) = (t — %2)/2 
~ — E(z) é 
g9— EQ) =4, 


we have only to show that é, 4, & — &, #1 — #2 are independently distributed. 
We obviously have 


at _ th + te 

,* =. 
It is evident that &, €&, # and #2 are independently distributed. Hence, 
Ele(& — &)) = (Ee — E&)/2 = O and also E[4(q: — )] = (Eni — Eq2)/2 = 0. 
Since € — é, #1 — #2, and é and 4 are normally distributed, the independence 
of this set of variables is proved, and therefore also (21) and (25) are inde- 
pendently distributed. It is obvious that the expression (20) is distributed 
independently of (21) and (25). From this it follows that 


N—2 Nlai(a—.)* + @ — az — 6)" 
2 (N — 2)s? 


_ (N — 2)[ai(a — a)’ + (G — a — 8)’ 
2[(s,)” + a°(s2)” — 282] 


E= 


(26) 


has the F-distribution (analysis of variance distribution) with 2 and N — 2 
degrees of freedom. The F-distribution is tabulated in Snedecor’s book: Caleu- 
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lation and Interpretation of ana of Variance, Collegiate Press, Ames, Iowa, 
1934. The distribution of } log F = z is tabulated in R. A. Fisher’s book: 
Statistical Methods for Research Workers, London, 1936. Denote by Fp the 
critical value of F corresponding to the chosen probability level P. Then the 
confidence region F is the set of points (a, 8) which satisfy the inequality 


a =2 ai(a — a) + (9 — ak — 8)’ 
27 aan i < Fo. 
( 2 (s,)° + a *(s2)° eas Qa8ry : 


The boundary of the region is given by the equation 





(28) ai(a— a)’ + (9 — at — B) = 3 [s)” ao 





= a’ (s2)° oe 2as:y). 


This is the equation of an ellipse. Hence the region RF is the interior of the 


ellipse defined by the equation (28). If Assumption V holds, the length of the 


axes of the ellipse are of the order 1/+/N, hence with increasing N the ellipse 
reduces to a point. 


7. The Grouping of the Observations. We have divided the observations in 
two equal groups G; and G2 , G: containing the first half (v1, yi), --- , (Gm, Ym) 
and Ge» the second half (2m41, Ym41), °°: , (tw, Yn) Of the observations. All 
the formulas and statements of the-previous sections remain exactly valid for 
any arbitrary subdivision of the observations in two equal groups, provided 
that the subdivision is defined independently of the errors «4, --- , €y; 
m,-*::,%7w- The question of which is the most advantageous grouping arises, 
i.e. for which grouping will a be the most efficient estimate of a (will lead to 
the shortest confidence interval for a). It is easy to see that the greater | a| 
the more efficient is the estimate a of a. The expression | a; | becomes a maxi- 
mum if we order the observations such that 71 < 72 <--- < ay. That isto 
say | a,| becomes a maximum if we group the observations according to the 
following: 


Rue I. The point (x; , yi) belongs to the group G, uf the number of elements 
x; (j # 12) of the series 41, --- , Xn for which x; < x; ts less than m = N/2. The 
point (x; , yi) belongs to G2 if the number of elements x; (j ¥ 1) for which x; < 4% 
is greater than or equal to m.. 


This grouping, however, depends on the observed values 2, --- , Zw and is 
therefore in general not entirely independent of the errors ¢.,---,¢yv. Letus 
now consider the grouping according to the following: 


Rute II. The point (x; , y:) belongs to the group G, if the number of elements 
X; of the series X,,--- ,Xwn for which X; < X; (j # 7%) is less than m. The 
point (x; , yi) belongs to G2 if the number of elements X ; for which X; < X;(j #1) 
ts equal to or greater than m. 
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The grouping according to Rule II is entirely independent of the errors 
a,°°')€3m™,-°°+,%n- Itis identical with the grouping according to Rule I 
in the following case: Denote by x the median of x, --- , Zy ; assume that e 
can take values only within the finite interval [—c, +c] and that all the values 
%), °°: » tw fall outside the interval [x — c, z +c]. It is easy to see that in 
this case x; < x (¢ = 1, --- , N) holds if and only if X; < X, where X denotes 
the median of X,,---,Xy». Hence the grouping according to Rule II is 
identical to that according to Rule I and therefore the grouping according to 
Rule I is independent of the errors « , --- ,€v. In such cases we get the best 
estimate of a by grouping the observations according to Rule I. Practically, 
we can use the grouping according to Rule I and regard it as independent of the 
errors €1, °:: ,€n 3 ™m,---, wn if there exists a positive value c for which the 
probability that | «| > c is negligibly small and the number of observations 
contained in [x — c, x + c] is also very small. 

Denote by a’ the value of a which we obtain by grouping the observations 
according to Rule I and by a” the value of a if we group the observations 
according to Rule II. The value a” is in general unknown, since the values 
X,,---,Xw are unknown, except in the special case considered above, when 
we have a” = a’. We will now show that an upper and a lower limit for a’’ 
can always be given. First, we have to determine a positive value c such that 
the probability that | «| > c is negligibly small. The value of c may often be 
determined before we make the observations having some a priori knowledge 
about the possible range of the errors. If this is not the case, we can estimate 
the value of c from the data. It is well known that if we have errors in both 
variables and fit a straight line by the method of least squares minimizing in 
the z-direction, the sum of the squared deviations divided by the number of 
degrees of freedom will overestimate 07. Hence, if ¢ is normally distributed, 
we can consider the interval [—3v, 3v] as the possible range of ¢, i. c = 3, 
where v” denotes the sum of the squared residuals divided by the number of 
degrees of freedom. If the distribution of ¢ is unknown, we shall have to take 
for c a somewhat larger value, for instance c = 5v. After having determined c, 
upper and lower limits for a’’ can be given as follows: we consider the system S 
of all possible groupings satisfying the conditions: 

(1) If z; < x — c the point (z;, y;:) belongs to the group G,. 

(2) If z; > x + c the point (z;, y;) belongs to the group G2. 

We calculate the value of a according to each grouping of the system S and 
denote the minimum of these values by a*, and the maximum by a**. Since 
the grouping according to Rule II is contained in the system S, a* is a lower 
and a** an upper limit of a”. 

Let g be a grouping contained in S and denote by J, the confidence interval 
for a which we obtain from formula (23) using the grouping g. Denote further 
by I the smallest interval which contains the intervals J, for all elements g 
of S. Then J contains also the confidence interval corresponding to the grouping 
according to Rule II. If we denote by P the chosen probability level (say 
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P = .95), then we can say: If we were to draw a sample consisting of N pairs 
of observations (21, y:), --- , (tw, yw), the probability is greater than or equal 
to P that we shall obtain a system of observations such that the interval I wil] 
include the true slope a. 

The computing work for the determination of J may be considerable if the 
number of observations within the interval [x — c, x + c] is not small. We 
can get a good approximation to J by less computation work as follows: First 
we calculate the slope a’ using the grouping according to Rule I and determine 
the confidence interval [a’ — 6, a’ + A] according to formula (23). Denote by 

Yi — Ye 
fi — Ze’ 
g of the system S, and by [a(g) — 6, , a(g) + A,] the corresponding confidence 
interval calculated from (23). -Neglecting the differences (6, — 6) and (A, — A), 
we obtain for J the interval [a* — 6, a** + Al. 

If the difference a** — a* is small, we can consider J = [a* — 6, a** + Alas 
the correct confidence interval of a corresponding to the chosen probability 
level P. If, however, a** — a* is large, the interval J is unnecessarily large. 
In such cases we may get a much shorter confidence interval by using some 
other grouping defined independently of the errors «, ---,€n 3 m,--:, wy. 
For instance if we see that the values 2, , --- , y considered in the order as 
they have been observed, show a monotonically increasing (or decreasing) tend- 
ency, we shall define the group G; as the first half, and the group G, as the 
second half of the observations. Though we decide to make this grouping after 
having observed that the values z;, --- , 7» show a clear trend, the grouping 
can be considered as independent of the errors ¢,,---,¢€w. In fact, if the 
range of the error ¢ is small in comparison to the true part X, the trend tendency 
of the value x; , --- , Zw will not be affected by the size of the errors €, --- , €y. 
We may use for the grouping also any other property of the data which is 
independent of the errors. 

The results of the preceding considerations can be summarized as follows: 
We use first the grouping according to Rule I, calculate the slope a’ = _— 

- 
and the corresponding confidence interval [a’ — 6, a’ + A] (formula (23)). This 
confidence interval cannot be considered as exact since the grouping according 
to Rule I is not completely independent of the errors. In order to take account 
of this fact, we calculate a*’and a**. If a** — a* is small, we consider J = 
[a* — 5, a** + A] with practical approximation as the correct confidence interval. 
If, however, a** — a* is large, the interval J is unnecessarily large. We can 
only say that J is a confidence interval corresponding to a probability level 
greater than or equal to the chosen one. In such cases we should try to use 
some other grouping defined independently of the errors, which eventually will 
lead to a considerably shorter confidence interval. 

Analogous considerations hold regarding the joint confidence region for « 
and 8. We use the grouping according to Rule I and calculate from (27) the 


a(g) the value of the slope, i.e. the value of 
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corresponding confidence region R. If | a** — a*| and | b** — b*| are small 
(b* = g — a*z and b** = g — a**z) we enlarge R to a region R corresponding 
to the fact that a and b may take any values within the intervals [a**, a*] and 
(b**, b*] respectively. The region R can be considered with practical approxi- 
mation as the correct confidence region. If | a** — a* | or | b** — b* | is large, 
we may try some other grouping defined independently of the errors, which 
may lead to a smaller confidence region. In any case R represents a confidence 
region corresponding to a probability level greater than or equal to the 
chosen one. 


8. Some Remarks on the Consistency of the Estimates of a, 8,o.,0,. We 
have shown in section 3 that the given estimates of a, 8, ¢, and o, are consistent 
if condition V is satisfied. 

If the values 2, --- ,2y are not obtained by random sampling, it will in 
general be possible to define a grouping which is independent of the errors and 
for which condition V is satisfied. We can sometimes arrange the experiments 
such that no values of the series 21, --- , yw should be within the interval 
[x — c, x + c] where x denotes the median of 2, --- , 2y and c the range of 
the error «. In such cases, as we saw, the grouping according to Rule I is 
independent of the errors. Condition V is certainly satisfied if we group the 
data according to Rule I. 

Let us now consider the case that X,, --- , Xw are random variables inde- 
pendently distributed, each having the same distribution. Denote by X a 
random variable having the same probability distribution as possessed by each 
of the random variables X1,---,Xw. Assuming that X has a finite second 
moment, the expression in condition V will approach zero stochastically with 
N — o for any grouping defined independently of the values Xi, --- ,Xw. 
It is possible, however, to define a grouping independent of the errors (but not 
independent of X,, --- , Xw) for which the expression in V does not approach 
zero, provided that X has the following property: There exists a real value A 
such that the probability that X will lie within the interval [A — c, A + c] 
(c denotes the range of the error e) is zero, the probability that X > A + ¢ 
is positive, and the probability that X < \ — cis positive. The grouping can 
be defined, for instance, as follows: 

The 7-th observation (z;, y:) belongs to the group G; if z; < A and to G if 
zt; >». We continue the grouping according to this rule up to a value 7 for 
which one of the groups G, , G2 contains already N/2 elements. All further ob- 
servations belong to the other group. 

It is easy to see that the probability is equal to 1 that the relation z; < A 
is equivalent to the relation X; < \ — c and the relation 2; > \ is equivalent to 
the relation X; > \ + c. Hence this grouping is independent of the errors. 
Since for this grouping condition V is satisfied, our statement is proved. 

If X has not the property described above, it may happen that for every 
grouping defined independently of the errors, the expression in condition V con- 
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verges always to zero stochastically. Such a case arises for instance if X, ¢ and 
n are normally distributed.“ It can be shown that in this ease no consistent 
estimates of the parameters a and 8 can be given, unless we have some addi- 
tional information not contained in the data (for instance we know a priori the 
ratio o,./c,). 


9. Structural Relationship and Prediction.» The problem discussed in this 
paper was the question as to how to estimate the relationship between the true 
parts X and Y. Weshall call the relationship between the true parts the struc- 
tural relationship. The problem of finding the structural relationship must not 
be confused with the problem of prediction of one variable by means of the 
other. The problem of prediction can be formulated as follows: We have ob- 
served N pairs of values (21, y:1), --- , (tw, yw). A new observation on z is 
given and we have to estimate the corresponding value of y by means of our 
previous observations (7 , y:), --- , (vw, yw). One might think that if we have 
estimated the structural relationship between X and Y, we may estimate y by 
the same relationship. That is to say, if the estimated structural relationship 
is given by Y = aX + b, we may estimate y from z by the same formula: 
y = ax + b. This procedure may lead, however, to a biased estimate of y. 
This is, for instance, the case if X, ¢ and 7 are normally distributed. It can 
easily be shown in this case that for any given zx the conditional expectation of 
y is a linear function of x, that the slope of this function is different from the 
slope of the structural relationship, and that among all unbiased estimates of 
y which are linear functions of z, the estimate obtained by the method of least 
squares has the smallest’ variance. Hence in this case we have to use the least 
square estimate for purposes of prediction. Even if we would know exactly the 
structural relationship Y = aX + £, we would get a biased estimate of y by 
putting y = ar + 8B. 

Let us consider now the following example: X is a random variable having 
a rectangular distribution with the range [0, 1]. The random variable «¢ has a 
rectangular distribution with the range [—0.1, + 0.1]. For any given z let us 
denote the conditional expectation of y by E(y | x) and the conditional expecta- 
tion of X by E(X |x). Then we obviously have 


E(y |x) = oH(X|z) + 8. 


Now let us calculate E(X | x). It is obvious that the joint distribution of X and 
€ is given by the density function: 


5 dX de, 
14] wish to thank Professor Hotelling for drawing my attention to this case. 


15 J should like to express my thanks to Professor Hotelling for many interesting sug- 
gestions and remarks on this subject. 
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where X can take any value within the interval (0, 1] and e can take any value 
within [—0.1, + 0.1]. From this we obtain easily that the joint distribution of 
z and X is given by the density function 


5 dx dX, 


where x can take any value within the interval [—0.1, 1.1] and X can take any 
value lying in both intervals [0, 1] and [z — 0.1, z + 0.1] simultaneously. De- 
note by J, the common part of these two intervals. Then for any fixed x the 
relative distribution of X is given by the probability density 


dx 


[ ax 
I 


z 


Hence, we have 


We have to consider 3 cases: 
(1) 0.1<2< 09. 
In this case J, = [x — 0.1, x + 0.1] and 


z+0.1 
[ XdX 


—O.1 
z2+0.1 
[ dX 
z—0.1 
(2) -0.1<2<0.1. Then J, = [0, 7 + 0.1] and 


2+0.1 
[ X dX 
E(X | 2) = ¢ aes Or + 05. 


dX 


E(X |x) = 


= z. 


0 


(3)0.9<2<1.1. ThenJ, = [z — 0.1, 1] and 


, 


X dX 
E(X |z) = 222 —_- = Br + -—.45. 


/ dX 
20.1 
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Since 


E(y |x) = «K(X | z) + 8B, 


we see that the structural relationship gives an unbiased prediction of y from z 
if 0.1 < x < 0.9, but not in the other cases. 

The problem of cases for which the structural relationship is appropriate also 
for purposes of prediction, needs further investigation. I should like to mention 
a class of cases where the structural relationship has to be used also for prediction. 
Assume that we have observed N values (11, yi), --- , (tw, yw) Of the variables 
xz and y for which the conditions I-IV of section 2 hold. Then we make a new 
observation on x obtaining the value x’. We assume that the last observation 
on x has been made under changed conditions such that we are sure that x’ does 
not contain error, i.e. x’ is equal to the true part X’. Such a situation may arise 
for instance if the error «¢ is due to errors of measurement and the last observa- 
tion has been made with an instrument of great precision for which the error of 
measurement can be neglected. In such cases the prediction of the correspond- 
ing y’ has to be made by means of the estimated structural relationship, i.e. we 
have to put y’ = az’ + b. 

The knowledge of the structural relationship is essential for constructing any 
theory in the empirical sciences. The laws of the empirical sciences mostly 
express relationships among a limited number of variables which would prevail 
exactly if the disturbing influence of a great number of other variables could 
be eliminated. In our experiments we never succeed in eliminating completely 
these disturbances. Hence in deducing laws from observations, we have the 
task of estimating structural relationships. 


CoLuMBIA UNIVERSITY, 
New York, N. Y. 





A METHOD FOR MINIMIZING THE SUM OF ABSOLUTE VALUES 
OF DEVIATIONS 


By Rosert R. SINGLETON 


1. Introduction. In the Philosophical Magazine, 7th series, May 1930, E. C. 
Rhodes described a method of computation for the estimation of parameters 
by minimizing the sum of absolute values of deviations. His is an iterative 
and recursive method, in the following sense. There is a direct method for 
minimization with one parameter. Assuming a method for minimization with 
n — 1 parameters, Rhodes imposes a relation between the n parameters (in an 
n-parameter problem) and finds a restricted minimum by the method for n — 1 
parameters. In this sense his method is recursive. He then repeats the process, 
by imposing on the n parameters a new relation determined by the restricted 
minimum. In this sense his method is iterative. The process is finite, ending 
when a restricted minimum immediately succeeds itself, indicating a true 
minimum. 

Rhodes’ paper presents the method without proof. The purpose of the 
present paper is to analyze the situation in detail sufficient to indicate proofs 
for various methods, and to present a new method which reduces the labor of 
solution by eliminating the recursive feature. The iterative approach is re- 
tained. The solution of Rhodes’ illustrative problem will be given for com- 
parison between the two methods. 

The paper uses geometric terminology and develops to quite an extent the 
geometry of a surface representing the summed absolute deviations. This 
seems the clearest means of presenting the relationships. Further analysis of 
the properties of this surface should lead to an even more direct method for 
attaining the minimum than the one here presented. 

In the writing of the paper, no attention has been given to sets of observa- 
tions or equations among which a linear dependence may exist. In practice, 
such a situation almost never occurs. If the need arises, the adjustments 
which must be made to take care of dependence are in each case fairly obvious. 


2. Geometric Analogue of Summed Absolute Deviations. Let n observa- 
tions on vy + 1 variates be represented by 24, y* where i = 1,---,nj;a = 
1,---,». Unless otherwise noted, latin indices have range 1 to n, greek indices, 
1 to vy. The summation convention of tensor analysis is used. 

' The variates are to be statistically related by the linear function’ 


7 = qu, 


1 This includes the linear function with a constant, since a variate x‘ = 1 may be used. 
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g* being an estimate of y’. u* are to be determined so that v = 2;| 9° — y’ | 
isa minimum. Set 











(1) v = aiu* — y' 


and determine functions e‘(u*) so that e'v’ > 0, and | e |= 1. It is immaterial 
that e* is not uniquely determined when u* satisfies vo’ = 0. Then v = X,e'r" 
is to be minimized. Using (1), 


(2) 


where 





v= Zu —y 





Le = Die'xi, y= Dery’. 























Consider a Euclidean (v + 1)-space, E,4:, with coordinates u’, --- , u’, v. 
The coordinate hyperplane perpendicular to the v-axis will be called Z,. In 
E,4, each of equations (1) for a particular 7 represents a v-plane which intersects 
E, in a (v — 1)-plane when v' = 0. Each of the equations 


(3) v = e(aiu* — y') 


represents two half-planes which touch E, and each other along the (v — 1)- 
plane given in E, by the equation 
(4) ziu* — y' = 0. 

The functions on the right-hand side of (3) are thus continuous everywhere, 
and linear in any neighborhood of E, none of whose points satisfies (4). Since 
a sum of functions continuous and linear in a neighborhood is also continuous 
and linear in that neighborhood, it follows that the function on the right in (2) 
is continuous for all uw, and linear for every neighborhood of E, containing no 
points which satisfy (4) for any 7. Hence 

OBSERVATION I: The surface (S) given in E,4; by (2) consists of portions of 


v-planes joined together. The projection of these joins on E, forms a network of 
(v — 1)-planes determined in E, by equations (4). 





3. Existence of a Minimum. Define a “bend of degree r on S’’ to be the 
locus of all points on S whose u-coordinates satisfy a set of r independent 
equations of (4). To each set of r independent equations corresponds a unique 
bend of degree r. 

If a linear relation u* = ayn’ + b*,¢ = 1,---,u < »v, rank (a7) = uy, is 
imposed on wu", all the preceding development, reduced in dimension, applies 
to the new variates rag, y* — 24.b*. 

OBSERVATION II: A section of S by a plane of any dimension d < vy has all 
the properties of an S-surface of dimension d. 

Since any set of consistent equations selected from (4) determines such a 
linear relation for u“, the application of Observation I to any of the bends of S 
shows that each r-bend consists of linear elements of dimension v — 1, joined 
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at points which lie on linear elements of lesser dimension. Thus S is a poly- 
hedron. Its faces we term complexes of dimension v, C,, and the linear ele- 
ments of its edges which lie wholly in bends of degree r, but not of degree r + 1 
are complexes C,_, of dimension vy — r. The boundary of any C,, a > 0, 
consists of complexes of lesser dimension. The term complex is not restricted 
to either open or closed complexes. 

Since the function v(u*) of (2) is non-negative, it possesses a greatest lower 
bound (g.l.b.) g. Since for some number h > g, there exists an N such that 
for all | u* | > N, v(u*) > A, it follows that for some closed neighborhood of E, 
the g.l.b. of v is g. Since v is continuous everywhere it attains its g.).b., and 
so S has minimum points. Since the minimum of any complex not parallel 
to E, , lies on its boundary, and the boundary consists of complexes, it follows 
that the minimum points of S consist of Co’s and/or entire complexes of dimen- 
sion > 0 which are parallel to E,. The next section will show that S has a 


unique minimum complex (including of course its boundary complexes) and 
furthermore is cup-shaped. 


Fia. 1 


4. Convexity Property; Uniqueness of the Minimum. Consider v = 1 in 
the preceding treatment (and for convenience not written). S looks generally 
like Fig. 1. The slope changes only where an equation of (4) has a root. Sup- 
pose the point is uw, and z'w — y' = 0. From (3), since v' > 0, it follows 
that ez’ < Oforu < w,ex' > Ofor u > uw. Since in (2) zx = Sex", and 
since for h sufficiently small and w — h < u < w + h the only e to change 
value’ is e', we have that 


a(u) + 2|e'x'| = (ue) 


where 


UwW—-h<wy<uwm<uw< wth. 


Hence the slope is a monotonic increasing step function. Since for u suffi- 
ciently small all e‘x* < 0, and for u sufficiently large all e’x* > 0, at some inter- 
mediate point or points either the slope is zero or it changes from negative to 


2 The e’s corresponding to equations proportional to equation (1) also change value at 2o. 
This does not destroy the argument. 
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positive without becoming zero. In the first case a single closed C; is the 
minimum complex; in the second, a Cy. In either case the curve given by (2) 
when v = 1 is concave upward and has just one minimum complex, except for 
complexes of lesser dimension constituting the boundary of this complex. An 
obvious consequence is 

Lemma [. The set of points u for which v is less than some number N form a 
convex point set. 

This result is easily extended to the general dimension v. If for any two 
points uw , Ue of HL, , v(w) < N and v(u2) < N, the plane in E,,,; given by u* = 
ui + A(u2 — ur) makes a one-dimensional section of S. By Observation II, 
the points wu lying on the projection of this section on FE, have the property of 
Lemma I and of course lie on the straight line joining uw; and u,. This is the 
property required for a convex point set. Hence 

THEeoREM I. The set of points u* of E, for which v(u*) as given by (2) zs less 
than a fixed quantity form a convex point set. 

From this it follows immediately that there is a unique minimum complex. 
It is appropriate here to point out that no two complexes can be contained in a 
single plane of the same dimension. This follows from the equation giving 
monotonicity of slope in one dimension, and Observation II. 


5. Gradient Directions. From here on the treatment will be of v as a function 
defined on EF, , and the equations will represent objects in E, , unless otherwise 
stated. Complex and Bend also will refer to the projections on EF, of the com- 
plexes and bends of S. For a single-valued function defined on EF, the gradient 
at a point is the projection of a normal to the surface representing the function 
in E,4,. If the function is defined only over a subspace of EF, possessing deriva- 
tives, the gradient will be required also to be tangent to the subspace. This is 
sufficient to determine a unique direction, and preserves the property that for an 
infinitesimal displacement in any direction the value of the function decreases 
most rapidly in the direction of the gradient. Here gradient is taken negative 
to its usual sense. 

A point u lying on a C, but not on a C,_; will have a gradient in C, and also 
in each higher-dimensional complex on whose boundary C, lies. If the gradient 
for uas a point of C,,; points into C,4. (remembering that u lies on the boundary) 
this will be called a usable gradient. In the case of the greatest k for which 
there exists a usable gradient, there exists but one C,4; providing such a gradient, 
and that gradient is the “best” gradient; that is, of all directions in E, it pro- 
vides the direction of most rapid decrease of the function v. This follows from 
Theorem I. Furthermore, all complexes of lesser dimension providing usable 
gradients lie on the boundary of this C,4;,. In fact 

TureoreM II. If for a point u on C,, two complexes C, and C,, s > r, lying 
in different bends of degree v — s but incident at C,, both provide usable gradients 
for u, then the complex C4, on whose boundary lie both C, and C{ also provides a 


usable gradient for u. 
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This follows from Theorem I. Select u: on the gradient in C,, uw. on the 
gradient in C;, for which v(w) = v(ue). The join of wu and wp lies in Cri, 
and for some point, us on this join, v(us) is less than v(ui) = v(ue). Also, the 
distance Wu; is less than at least one of uu, uue. Hence C,4; must contain a 
usable gradient. 


6. Selection of Best Gradient at Bends. The direction of the gradient for a 
point % considered as lying on a C, is given by 


(5) g” = —2a(uo) = —Zie*(uo)ae. 


If uw lies in the interior of a face, this is unique. If wu lies in a bend, so that 
some e’ are not determined, the g* for each face is found by selecting the indeter- 
minate e’s as +1 or —1, according to the face being considered. 

For a point uw considered as lying on a bend of degree r, given by r inde- 
pendent equations of (4): 


(6) vu" — y = 0, (A= 1,---,7), 


the gradient for a particular C,_,, determined by the conditions at the begin- 
ning of section 5, is 


(7) g* = tik, — La 
where ky satisfies 
Lat tlk, = Tats La, (u = 1, vee, 7) 


and %_ is as given in (2), the choice of sign for the indeterminate ¢* 
(\= 1, ...,7) being immaterial. They may, in fact, be taken as 0 in this 
instance. 

For a point uo lying on an r-bend given by (6), to determine which complex 
contains the best gradient, each (r — 1)-bend incident on the r-bend at wu is 
tested for a usable gradient. Theorem II then determines the complex con- 
taining the best gradient. 

There are 2r such complexes incident at wo , given by the r sets of equations 
selected from (6): 


(¢=1,---,A—1A41,---,7) 
(A= 1,---,7). 


The two complexes lying in the same (r — 1)-bend have the same equations in 
(8), but are distinguished later by ¢(w) for the omitted equation being taken 
first +1, then —1. 


The gradient for the Ath pair of complexes is 


(8) (A): 2au* — y? = 0 


9x = Lake — La 


similar to (7), but not identical. For c’ = +1 in determining z,, we have 
gx, and for ec’ = —1, gf. We restrict the consideration to ¢ = +1. 
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The line in the direction of greatest slope is then 

ut = us + ghpt. 

Now w is here considered lying on the complex given by (8A) with ¢ = 441, 
In order that gx. point into this face, the deviation for the Ath observation 
must exceed 0 when ¢ > 0; otherwise, for a displacement in the direction of gx, , 


é* changes sign immediately and the course is in the other complex. This 
deviation is 


» hoa d hoa d d doe 
v= tau —Y = Talo — Y + Tage = Fagrt. 


Had gx_ been used, this deviation must be less than 0. Hence a necessary and 


sufficient condition that a complex given by (8) with either choice of ¢* possess 
a usable gradient is 


(9) @ = ¢[Dariaike — Vat Lal > O. 


For r = 1 the condition is given by (9) with the first sum merely omitted. 
®,, and ®,_ cannot both exceed 0. 

When all sets of equations (8A) are tested by (9) the equations common to 
all sets possessing a usable gradient determine the complex with the best 
gradient, retaining the values of e for which (9) was satisfied. 


7. Property of the Minimum Point. For a minimum point, given by (6) 
with r = », all & must be negative. Define X°’ = D,r522 and X” = Y.riz, 
for convenience. Then in (9), the numbers k, , —1 are seen from their defini- 
tion in (7) to be proportional to the cofactors of the Ath row of the matrix 
(X”’, X*°), w having the same range as X. Thus ® . = c Det (X”’, X%°), and 
#,_ = —c Det (X”’, X*°), where in the first case X“° is determined with e* = +1, 
in the second with ¢ = —1. The factor of proportionality, c, must be the 
same since X”’ is unaffected by change of ¢. Now let X* = 2,722 where 
a. = d,e'r*, , the range of k omitting the range of \. Then 


#,, = c [Det (X”’, X“) + Det (X”, X”)] 
and 
@,_ = —c [Det (X”, X”) — Det (X”, X”)]. 
Hence 
#42, = —c’ {{Det (X”, X”)? — [Det (X"”, X”)]’}. 


Now let A represent the square matrix (z'), a giving the rows and A the columns. 


Let B, represent the matrix formed from A by replacing the Ath column by Ze. 
Then 


$4, = —c’ [Det’ (A’B,) — Det’ (A’A)] 
= —c’ Det’? A (Det? B, — Det? A) 





MINIMIZING SUM OF ABSOLUTE DEVIATIONS 


and this will have the same sign as 
W, = | Det (A) | — | Det (B,) |. 


Since 4 and #,_ are never both positive, and at the minimum are both nega- 
tive for all A, at the minimum all ¥, > 0. To determine all , together, let, 
in matrix notation, z’ = (a, ---,z,) and 2*’ = (a; pote , Ze) where x= were 
defined previously. Determine z as the solution of Az = z*. Then | Det (B,) | 
are equal to | z,|| Det (A) |. Hence a necessary and sufficient condition that 
¥, > 0 for all » is that all | z, | be less than one. Hence 

‘THEOREM III: If a zero-complezx is given by a set of equations whose matriz is M, 
a necessary and sufficient condition that the complex be a unique minimum is that 
the solutions of M'z = x* be all less than one in absolute value. If k of the solu- 
tions are equal to one in absolute value, and the rest are less than one, the minimum 
is a complex of dimension k with the zero-complex as one of its corners. 

The last statement follows since if one solution is 1 in absolute value, a 
corresponding ®, = 0, and hence no gradient, usable or not, exists. Thus the 
corresponding complex is parallel to E£, . 


8. Minimization for One Dimension. A method for minimization of (2) when 
there is just one parameter evolves from the monotonicity of slope in that case. 
Suppose the variates are w' and 2’, and (1) is 


(10) v = wit — 2’, 


Suppose the variates are arranged in order of z‘/w’, starting with the smallest. 
The slope of the rth segment (Fig. 1) from the left is 


Liwl|—- LY |w'|. 

t=] t—r+1 
The minimum occurs when the slope is 0 or changes from negative to positive; 
that is, when the first sum equals or exceeds the second; or when the first sum 
equals or exceeds half the total. This is a standard computation. If the 
change takes place when r = k, then t = 2‘/w* is the value of ¢ giving the 
minimum. 


9. Mimimization Procedure for v + 1 Dimensions. For any continuous func- 
tion with unique minimum and having the property of Theorem I, the following 
holds. Let uw be any point of E,. Let wiz. = us + Ati, where A; is any 
direction chosen at random and ¢; is the value of ¢ for which the function attains 
a minimum on the curve vw = u; + At. Then the probability is one that 
lim u; = u,, where u; is a minimum point for the function. If A; is taken 


i 
always as the gradient of u; , such a procedure is called the “method of steepest 
descent” for approaching the minimum point. 

Usually the limit is never attained. In this case, however, the minimum is 
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attained. The minimum can be approached as closely as desired, hence a 
complex incident on the minimum is reached. But the convex point sets of 
Theorem I surrounding the minimum complex are all similar convex poly- 
hedrons in E,, whose corresponding faces are parallel, and the gradients at 
points on a bend cannot point into a higher dimensional complex on the bend. 
Hence the sequence of points lie on bends of successively greater degree, and 
must eventually attain the minimum complex. 


TABLE II 
Points ux 





Uner = Unt Gate 


us = (38, —5, —2) 

u, = (37.98202, —4.74828, —1.48457) 

us = (37.45908, —2.07142, —1.85631) 
(2.83333, —2.07142, —1.76191) 





TABLE III 
Computation of t, = 2./we 








in order | 
of col. | 
| 
| 
i 
| 
| 


exceeds att = hence = 








17521 | 16 | .00599334 
2502 | 2 | .0397792 
4610 | 10 | .00496545 


(10) 
(15) 
| (20) 


TABLE IV 
Gradients gf for column (5k + 8) 





. | rm lls) o 
al nit dielnaiiel ea 
0 3 | 42 

1 — 13146 _ 

2 — 931588 | 





The computational procedure is as follows: 
. Select a point u%. 
. Determine the gradient, go from (5). 
. Compute wp = xzhgo, 2 = y' — rau. 
. Determine t) by the method of section 8. 
. Compute uy = uo + gol. 
. Determine the complex containing the best gradient by (9), and the 
gradient gi by (7). 
and so proceed to the minimum. This may be finally tested by Theorem III. 
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Step 5 is unnecessary, since the only use for ur is to determine e‘(u:). But 
e'(uw) = e'(t), the latter referring to the computation in step 4. Also, after 
the first step, it is easier to compute z’ by 


‘ ‘ ‘ 
fet+1 = Ze Wyle i 


10. Example. The computation for (9) is not so great as it would seem, since 
some of the work is duplication and some must be computed anyway for the 
gradient. Even so, for r > 3 it becomes, perhaps, more arduous than its 
contribution would seem to justify. For vy > 4 it is recommended that the 
test of (9) be omitted for points on bends of third degree or greater, and the 
final test of Theorem III be applied at the end of the work. If this test shows 
the minimum has not been reached, the complex in which lies the best gradient 
will be indicated at the same time. 

The minimum number of steps is 0. The maximum number is tremendous 
but finite. The expected number is probably a little greater than v. 

In Tables I to IV, the method is applied to the problem used by Rhodes to 
illustrate his method. The independent variates are shown in columns (2), (3), 
(4), Table I, the dependent variate in column (5). The only other original 
datum is the initial point, selected by guess, shown in line 1, Table II. Since 
slightly different formulas were used in the computation, the signs of cols. 
(6), (8), (11), (16), (18) are reversed, and the gradients in Table IV are 
multiplied by constants. As they are used only for directions, this does not 
matter. 
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A STUDY OF A UNIVERSE OF n FINITE POPULATIONS WITH 
APPLICATION TO MOMENT-FUNCTION ADJUSTMENTS 
FOR GROUPED DATA 


By Josepu A. PIERCE 


The object of this paper is to study the case of a universe of n finite popula- 
tions, considering both the expectations of population moment-functions and 
the moments of sample moments, and to make applications of the results which 
may be of interest to mathematical statisticians. The sampling formulas which 
are derived reduce to the usual infinite or finite sampling formulas, under 
appropriate assumptions. Also a method is given whereby finite sampling 
formulas may be transformed into the corresponding infinite sampling formulas. 

The general methods and formulas which are given in Part I for the expecta- 
tions of population moment-functions are used, in Part II, to find the expecta- 
tions of moments of a distribution of discrete data grouped in “k groupings 
of k’’. 


I. A Srupy or A UNIVERSE OF 7 FINITE POPULATIONS 


Let ,U’ » be a universe composed of the set of populations ,X, (r = 1,2, --- , n) 
each population ,X consisting of a finite number of discrete variates ,z;, 
(j= 1,2,---,N),(N > nn). The tth moment of ,X is denoted by ,u,. The 
tth central moment of ,X is denoted by ,#,. The ¢th moment and the th central 
moment of ,Uy are respectively denoted by uw, and ~,. The expected value of a 
variable y is denoted by E(y). Wehave 


1 N 1 N 
he = E(,x;) _ N » Mi, = E(x; = rit)‘ — N a (2, a rit)‘, 


n 


1 l< 
a E(,u:) = n Z rHt My, = E(, it) — 2 7 rit , 


r=1 r= 
_ $1 82 8v 
Msisq---8y:me Bebe, E(s me} robes ee robte)s 


ss i “ — =81 —82. =8y 
Ms,82---8y:pe Wty" "Bet, —_ E(t} rit? eee rylt’)- 


We also note that p,s.---sy:m,,ue,---ue, MAY be written pan. .rtu{1ut2---uee - 


1. The expected value of moments and central moments. It follows easily 
from (1.1) that 
(1.2) 


= RA gO Ca SOR BY GN ESI 2. ae 
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From the usual formula for central moments in terms of moments, we get 
(1.3) pase = (1G) pine 

Terms of the form ya:,,,,-; may be evaluated by use of the well known formulas 


[20; p. 58] for changing from moments to central moments in the case of a multi- 
variate distribution. Two of these formulas are given below. 










Mil:poup = Pil:uguy — !10:pgupM0l:uguy - 









MAL: poupte = MIL:paupte — M110: nqupneMO0l:uaupue 


(1.4) 


— P01: pony eMOl0:poupee ~~ MOU:ngupueM100:uaupu, 


+ 2 100: ug epee 010: uaupy-MO0l:ugupe: . 


We find that 












a! a Pp r 
(1.5) Miligyp,-) = 2 pilpolri ire! Bp yrysuywe—< Mlzus Aue —< ’ 


where pip is a two-part partition of 7 and 7 + re = 1. 
Using (1.3) and (1.5), we get 


(1.6) Mizzs = He — Mey, . 
(1.7) Bizzy = Bs — Suzy, + Sprile:n, + 2fs:y, - 
(1.8) Miz, = Ha + 6(ie — Qui) Herp, — L2yrfis:y, + L2yrii:y,y, 


= 4iit:ysus + 6 flet:n ue a Sila:y; . 
etc. 


If the n populations are identical, it is evident from the definition of 4;;, 
that, for all finite ¢, 














Mizz, = Mm. 





2. The expected value of Thiele seminvariants. If the ith Thiele sem- 
invariant is denoted by \,;, then 


(—1)"*t'(p — 1)! 


(1.9) Ma, = 2 Bi!sq! «+ 8 1(2!)*2(3!)* --- (vl)* May89---8y:mime++ +My? 


the summation being taken for all positive integers s;(i = 1, 2, --- v), for which 
p=Ea, t= Du. 
i=l tl 


Terms of the form ys,s,.--s,:0,42---», are evaluated by (1.4). We have 
(1.10) 


Mizr. = Ae = Bez, . 
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Miry = As — Siirzyyug + GArHey, + 2iis:n, « 

wir, = Aa + 12[Ae — 2Ai] ew, — 24Arfs:y, + 24 iiir:y sue 
— Afiir:pyyes 1+ 12ferzying — Siierpg — Sitarp, - 
etc. 


If the m populations are identical then, for all finite ¢, 


Mizd, _ At e 


3. Generalized sampling. It follows from definition that all rational isobaric 
moment-functions have the property that they may be expressed in terms of 
power sums and power product sums with certain coefficients. Of the power 
sums and power product sums which enter a sampling formula only the power 
product sums take different forms depending on the law of variate selection. 
Now, there are two possible courses which may be followed by one who wishes to 
derive sampling formulas for the case of a single population. 

1. One may decide in advance on the law which he wishes to govern the 
selection of variates which enter the sample. Then he may apply this law in 
the evaluation, in terms of moments, of every power product term as it occurs 
in each formula which is derived. 

2. One may derive the formulas for sampling under the condition that the 
law is unspecified, thereby obtaining formulas which are capable of being 
interpreted in terms of laws that are decided upon later. 

We illustrate the two possible courses by considering the formula, 


2@r(r — 1) ._ . 
ie a Zz; 8; , 


(1:13) jie:e = - rz + 


which Carver [12; p. 102] obtains for the case of finite sampling without replace- 
ments. Here r = the number in the sample, s = the number in the parent 
population and z; = the algegraic sum of the variates of 7th sample. Later, 
by evaluating 2@ and DZ,z; in terms of moments, he finds 


(1.14) ime SO w.. 

s-—l 
(It should be noted that Carver [12; p. 115] obtained the corresponding formula 
for infinite sampling by letting s — ~). 

The preceding development is entirely in accord with the first of the courses 
stated above. It is also the standard procedure and is the course followed by 
such writers as Isserles [2], Neyman [6], Church [7], Pepper [11] and Dwyer [20], 
in deriving finite sampling formulas. Also, it is the course followed by such 
authors as “Student” [1], Tchouproff [3], Church [5], Craig [9], Fisher [10], and 
Georgesque [13] for the case of sampling from an infinite population. 
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However, in (1.13), it is possible to employ the definition, 
2 
——~_. 38;8; = ji. 
s(s — 1) — 
Then (1.14) becomes 
(1.15) He:z = Tpe + r(r = 1)fi1 ° 
Formula (1.15) may be interpreted as holding for either finite or infinite 
sampling, depending on the interpretation which is given to fi,. It may be 
a | 
ai 1 He and 
(1.15) reduces to (1.14). If the sampling is from an infinite supply, fii, becomes 
fi; and therefore 


easily shown that, if the sampling is from a limited supply, fi: = 
8 


He:z = Thle:z , 
which is the formula (12; p. 115] that corresponds, in the infinite case, to (1.14), 

Thus, either of the two courses is possible in the case of sampling from a single 
population. However, if one wishes to get general formulas which hold for both 
infinite and finite sampling, he should follow the second course. Similarly, in 
order to obtain generalized sampling formulas where the relations between the 
variates are unspecified and the populations are assumed to be different, the 
second course should be followed. 

It appears that Tchouproff [3], [4] was the first to approach the sampling 
problem from such a general point of view. However, his methods of derivation 
are quite complicated and his results, in general, are difficult to apply to a given 
problem [5], [8]. 

Samples of n are formed from ,U y by chosing one variate from each of the n 
populations. A typical sample is 


















12 i, » 2Vi, » 30, .* rVi, y i y nXi, - 


We define [4; p. 472] 
1 n 
K iggings ip =i 


7 jATk 


(1.16) = ryrg- ++ ty Miyte-+-tyy 





ty te Re, ae ti te atv 
riVi, reli, aan ryLiy, ai E( +i}, riz, sates riz) 







v 1 
n >> Tytg:++TyMtyte---ty = no Sy Tyg: ++ Tybleyte---ty = Meyte-+-ty» 









where k represents the number of possible terms of the given form; S, means » 
. (v) 
times the sum for unequal values of ,7r2--- 7 and n” = n(n —1)--: 


(n—v+1). 







4. Moments and product moments of sample moments. The ‘th moment of 
the jth sample is denoted by ;m,. The sth moment of ;m; for all 7 is denoted by 
‘ue:m, Where the prime indicates that the moments of the universe are measured 
about a fixed point. It follows that 
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n 


1 : 
(1.17) jm, = = ti, and ‘psm, = Elm)’. 
r=1 
Also, the general product moment, in which the variates of both the sample 
and the universe are measured about a fixed point, is defined by 


, 8 8 8 
(1.18) Ms;89---+8y:m¢ te", — Eljymii jmi3 — jm’). 


1 


As an illustration of the methods used to derive the formulas of this section, 


consider a special case of (1.18) when s; = 2 and s; = 0, (¢ = 2,3, ---,v). Then 
1 n 2 
! ‘i. t 
M2:m, = ta[D a, | 
1 n 
2¢t 
— —E > rUi, + Se 2%, re Sy 
n r=1 1 2 
1 n 
- > ret + Se ryroMt,t | 
n* Lr=1 


Therefore, by (1.1), (1.2) and (1.16), we get 
1 
(1.19) ‘U2:m, = =a Lmmae +n” b,t]. 


Using the formulas [20; p. 34] relating products of power sums and power 
products to expand expressions of the type E(,;mi! ;m:? - - - ;mi?), we give, in the 
tables below, formulas for moments and product moments of sample moments 
through weight six. The number in a cell and the coefficient, in the same 
column, at the top of the table should be taken as the coefficient of the moment 
which is found in the same vertical division. The coefficients in the vertical 
division are coefficients of the entire right members of the formulas for the 
respective moments. 

Terms of the form wz,:,...1,, if h = te = --- = t, = t, are sometimes written 
ee 

The numbers in the cells of the tables are identical with the numbers in the 
cells of the tables given by Dwyer [19; p. 30] for the expected value of partition 
products. 


5. Moments of central moments of samples of n. The ‘th central moment of 
the jth sample is denoted by ;m,. Then, 


(1.20) fe =~ D(a, — sm)’ 
Wt r=1 


and 


(1.21) "berm, ss s|* om (24, = mT. 


r=] 
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TABLE I 
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a & dbaietinnincanaastttlteinlisé 
| Coef. | n | Coef.. | n | n® / | | Coef. |” | n°?) | n'3) 
| Hi | | He | Bi? | Bs | ma 1 | ue 
‘arm | mt | Pfam fom | | fate | an 
a 'Le:my | n-? | . 24 ny | n-? | 1 | 1 | 
il gate hse aa aaa an sips — —_ J esennes 
ie mi | n-3 a 1 | 3 | 1 
EES. (5) anges 
| Coef. n | n®| n@| n@ | n@ n®| n@ 
I — 
| | #5 Ms. Soe asl nw a ma @) meesaal 
Lizms nt |? | | | Coef.| n | ni2| no no) nd) 
| | —| | wa ns | \—7—| 
Hatem | wt [1] | Baal Pe ee 
‘wires | wm? | 1] | 1] | | a n- | 1 | 



























































| j 
‘Matimims | n* | 1 | 2 | 1 | 1 | | ee 4 1 | |__| 
‘wizimme | wm? | 1) 1 | 2 a | ‘unm | wea] fat | 
‘usximm_| n- 1 3 | 4 3 | ci i) ‘Sem in pif 2fala| 
‘wm =| {1/5 [10] 10-] 15 [10/1 wem =| wt |] 4] 8] 6] 1 












| Coef.| n | n® | n® | n@ | n) | 











Me | Mo.1 | M42 | jo3.3 | M4.i2 | 3.2.1 






Mi:me 


n (3) | a 
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BS | B3.18 | 2,1? 
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Bil: mims 
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Mil: moms 
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M2: mg 
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M2i:mimg 
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M31: mims 
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M22: mime 
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After writing (.7;, — ;m,)' as the sum of the general term of a binomial series 
and then expanding the resulting right member of (1.21) as a product of power 
sums [20; p. 19], we get 


s! - ey /¢\" 
/ - e 
: = wa —1 . . 
ine eg ee sd dg (—1) (5) is 
: * wip 


2 
(1.22) ti xfiggs-* 


i+w 
/ 
iy Bryrg-+-rypimy— 5 smp—ggssomp— i my 


v v 
where >, rt; = 8, Zz i;7; = pand m, m2, --- are the numbers of the repeated 
j=l 7=1 


parts of s. 
The mean of the ¢th central moment takes the following simple form, 


(1.23) ‘Him, = : (—1) (‘) “Prizmy— sma y 


where the moments in the right member of (1.23) through weight six are given 
in the tables of section four. Also, 


(1.24) ‘paring = 'Ye:m, — 2’ per:mym, + "Harm, - 
(1.25) ' U8: te = ’ U3: me aaa 3’ (22: my me + 3’ par: me — "U6: my ° 
(1.26) "U2: tig _ ’ b2:ms + 9’ 120: m, me + 4’ U6: m, outs 6’ patt:m, mam, 
+ 4! 31:1 m3 — 12" 1141: m, me ° 
After substituting from the tables of section four, (1.23) through (1.26) become 


n 


(1.27) ‘um, = 7 (us — pial. 


(3) 


(1.28) ‘pin, = > [je — Spas + 2yns). 


ry . — Lye _ = (2) le 
(29) “ala (n’ — 3n + 3)(us — 4usa) + 8n(2n — 3)u22 


+ 3n(2ue12 — pis)). 
= S[n(n? — 2m + 2)(us — Susa) + 10n(n — uss 
+ 10n(n + 1)(m — 4)us.i2 — 30n (nr — 2)uos, 


— 10n(3n — 4)yors + 4n wis). 
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J tn'(nt — Sn® + 10n* — 10m + 5)(us — wos) 
+ 15n(n®? — 4n? + 7n — 5)ps2 — 10n(2n? — 6n + 5)yy: 

(1.31) + 15n®(n® — 4n? + 6n — 5)paaz — 60n™(n? — 4n + 5)usa1 

+ 15n®(3n — 5)y2s — 20n(n? — 3n + 5)ys,as 

+ 45n(2n — 5)perae + 15n(n — 5)pore — Sn wre]. 




















, 1 ) 
(1.32) ‘wen. = ln (n — 1)(us — 4ysa) + n(n + LW pee — 2 (Qyer2 — bus). 


‘sm, = Sinn — 1)°(us — 6ps1) + 3n(n — 1)(n? — 2n + 5) ae 
— 2n® (3n* — 6n + 5)uss + n(n’ — 3n® + On — 15) 
(1.33) — 3n™(n — 1)(n — 5)pare — 12n(n? — 4n + 5)ys01 
+ 4n(3n — 5)ysax — 3n\(n? — 6n + 15)ya2,12 
+ n©(3y214 — wis). 


bn . 
‘Wen, = ayn (n — 1)°(n — 2)(us — 6ys,1) — 3n?(n — 2)°(2n — 5) yas 

+ n(n — 2)(n? — 2n + 10)yss 
(1.34) — 6n(n — 2)(n? — 6n + 20)ys21 + 3n (nm — 2)(7n — 10) ysis 


+ 3n(3n® — 12m + 20)ye3 + 4n(n — 2)(n — 10)ysis 
+ On (n? — 8n + 20)p2212 — 4n® (Syeis — pis)). 





6. The variance of the variance of samples of n. The variance of the variance 
of samples of n, when the moments of the universe are measured about a fixed 
point, is defined as 


(1.35) ’ fle: ing = "2: in — (‘uizmel’- 


Therefore, from (1.27) and (1.32), 


' 1 ; 
Ma:m, = wala (n = 1) (us = 4us,1) + n(n va 1) p2,2 = n (Que12 = ys) 


- (=) (ue — pia)’: 


Tchouproff [4; p. 492] gave a formula (8) for the variance of the sample 
variance but his result is unwieldy due to the fact that moments of the universe 
are measured about the mean. 


(1.36) 
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7. Conventional infinite sampling formulas derived from generalized sampling 
formulas. The term “infinite sampling” is to be interpreted as meaning: 
sampling from an unlimited supply or sampling from a limited supply with repeti- 
tions permitted. In each of these situations the variates are independent [5; p. 79]. 

First, it is assured that the n populations are identical, that is, ;.X = .X =... 
=,X. This assumption results in the fact that, for a fixed t, yw: = ous = --- = 
nt ANG afte = oft, = --- = fi. Therefore, under the assumption of identical 
populations, every moment may be interpreted as either the moment of n identi- 
cal populations or as the moment of a single population. The only other as- 
sumption is that the sampling is “‘infinite’’. 

From the condition of independence [3; p. 141], we have 

E(,,24}, nis, eee iy ) - (E,,23} (E,,2:2,) re (E,,z%; ). 
Therefore, 


Tires + Tobey te --+ty = ryMey rgbt, °° * robe, - 
Combining the condition of independence with that of identical populations, we 
have 


(1.37) 2 ; 


n S, T1TQ°+-Tybbtyto---ty — n) S, r;Mt, robt, oo ryMt, — Mi, Mt. lich Ht, . 
By (1.16) and (1.37), we may write 


(1.38) Miyte---ty = Mi;Mt, *** Mt, + 


Since the only terms of the generalized sampling formu:a: ' -@ affected 
by the assumption of “infinite sampling” are those of the form y;,:,...:,, the 
problem of obtaining conventional infinite sampling formulas from generalized 
sampling formulas is, in practice, a mechanical one. Simply write terms of the 
form prt,1,...4, Which appear in a generalized sampling formula, as wi,ur, --- Me, 
and one automatically obtains the corresponding infinite sampling formula. 

As an illustration of the method, consider the generalized sampling formula 
(1.36) for the variance of the sample variance. When (1.38) is utilized to change 
it into the corresponding infinite sampling formula, (1.36) becomes 


(2) 
(1.39) ‘fiesing = < [(mn — 1)(u4 — 4s) — (n — 3)p2 + 2(2n — 3)(Quepni — uid], 


which is the usual formula [20; p. 75] for the variance of the sample variance 
when the moments of the universe are measured about a fixed point. If it is 
assumed that the moments of ,U y are measured about the mean, formula (1.39) 
becomes 


(2) 


(1.40) fins, = ~~ [(m — 1)ia — (m — 3)ai), 


which was published by ‘“‘Student”’ (1 ; p. 3] in 1908. 
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8. Conventional finite sampling formulas derived from generalized sampling 
formulas. ‘The term “finite sampling” is to be interpreted as meaning: sampling 
from a limited supply when repetitions are not permitted. 

In order to reduce generalized sampling formulas to the corresponding formulas 
for finite sampling, the assumptions are made that the n populations are identical 
and that N and mare finite, N > n. The selection of variates which enter each 
sample is restricted in the following manner. If a variate having a given post- 
subscript is chosen, then no other variate having the same post-subscript may be 
chosen for the same sample. 

Now it is evident that terms of the form y,,:,....., must be redefined on the 
basis of the preceding assumptions. From the expansions (20; p. 32] of power 
product sums in terms of products of power sums, we get the formulas for Meyty---t, 
which are given in the following tables. 

The formulas in the tables of this section are called transformation formulas for 
finite sampling or more briefly transformation formulas. 

The transformation of generalized sampling formulas into corresponding 
pV “= 2 

N® 


finite sampling formulas is illustrated by the substitution o 


in (1.27). We get 


~" for H1,1 


_ N(n— 1) 


(1.41) ‘Mi: = n(N — 1) [ue — wil, 


which is the well-known finite sampling formula for the mean of the variance of 
samples of n. 

From this and the preceding section it is evident that the generalized sampling 
formulas may be considered as formulas for either infinite or finite sampling 
depending upon the interpretation given to terms of the form pz,:,...¢, . 


9. Transformation of infinite sampling formulas into corresponding finite 
sampling formulas. It is a well-known fact that infinite sampling formulas may 
be obtained from those for finite sampling by letting the size of the parent popula- 
tion become infinite. But, prior to this paper, apparently no one has presented a 
method of obtaining finite sampling formulas from infinite sampling formulas. 
However, by making use of the relations between finite, infinite, and generalized 
sampling, we shall demonstrate that it is possible to transform any infinite 
sampling formula into the corresponding finite sampling formula. 

Since the infinite sampling formulas are obtained from the generalized sam- 
pling formulas by replacing 


Meyty---tp DY MeyMt, + °° Mt, 


it follows that generalized sampling formulas may be obtained from the infinite 
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TABLE II 
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formulas by replacing 
(1.42) MesMty *** Mey by Meyty---ty + 


However, it must be emphasized that the application of (1.42) demands formulae 
which are expressed in terms of moments of sample moments rather than central] 
moments of sample moments (although the sample moments may be measured 
about a fixed point or about the mean) and the moments of the universe must be 
measured about a fixed point The reason for these restrictions is to insure that 
ach term is accounted for individually. 

After replacements (1.42) are made in the formula for sampling from an 
infinite population, the resulting formula is the corresponding generalized one. 
The step to the corresponding finite sampling formulas is simply the one outlined 
in section eight, namely, the use of the transformation formulas. 

We shall consider, as the first illustration, the infinite sampling formula for 
the mean of the sample variance when the moments of the parent population are 
measured about the mean. The formula is 


mn—-1_ 
(1.43) Mizmy = ie. 
n 
When (1.43) is expressed in terms of moments of the parent population about a 
fixed point, we have 


n—1 2 
(1.44) "Masten — n [ue s pil. 
Following (1.42), uj is replaced by ui, and (1.44) becomes (1.27). The use of 
the transformation formula for y;,; gives (1.41) which, when the moments of the 
parent population are measured about the mean, becomes 


N(n — 1). 
n(N — 1)"?" 

Infinite sampling formulas expressed in terms of moment-function, may be 
similarly transformed into the corresponding finite sampling formulas. For 
example, Craig [9; p. 57] gives the second Thiele seminvariant of the variance 
of samples as 


(1.45) Mi:mo = 


(1.46) Nene = (n- 1, 42 


n3 


n* 


First, we express (1.46) in terms of moments about a fixed point by use of the 
formulas relating Thiele seminvariants and moments [9; p. 12]. We also recall 
that the resulting formula should be expressed in terms of moments of sample 
moments rather than in terms of central moments of sample moments. We 
obtain 


(n — 1) 
n3 


Me:rig [(n — 1s — 4(n — 1)psyr + (nr? — 2n + 8) yo 


— 2n — 2)(n — 3)uop? + (n — 2)(n — 3)mil. 





se of 
f the 


ay be 
For 
‘jance 
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The next step is to transform (1.47) into the corresponding generalized sampling 
formula by use of (1.42). We obtain (1.32). Since we desire to obtain the 
finite sampling formula which exactly corresponds to (1.46), it is necessary to 
transform (1.32) from the second moment of m2 to the variance of m2 and we get 
(1.36). Next the transformation formulas are applied to (1.36). When the mo- 
ments of the parent population are measured about the mean and are replaced 
by Thiele seminvariants, (1.36) becomes 

N(N — n)(n — 1) 


ue) OO WIP DW a) ~V~ Nn De 





+ 2(N’n — 3Nn — 3N + 3n + 8)ai3l. 


Formula (1.48) gives the second Thiele seminvariant of the variance of samples of 
n drawn from a finite parent population of N. When N — ~o, in (1.48), we 
obtain immediately (1.46). 

It is generally true that infinite sampling formulas are more easily derived than 
are the corresponding finite sampling formulas. The methods of this section 
make it possible to derive the desired sampling formulas for the infinite parent 
population and then transform these infinite sampling formulas into the corre- 
sponding finite sampling formulas. 


II. Moment FuncTIon ADJUSTMENTS FOR GROUPED DATA 


A given distribution of discrete variates may be grouped in “‘k groupings of k’’. 
We desire to find the correction which eliminates the error made in replacing a 
given moment of the original distribution by the average of the corresponding 
moments of the k grouped-distributions. 

Formulas for the adjustments for moments of a grouped-distribution of 
discrete variates were first given (without proof) in the Editorial of Vol. I, No. 1 
of the Annals of Mathematical Statistics. Later, more satisfactory derivations 
of adjustment formulas were given by Abernethy [24] Craig [25] and Carver [26!. 
However, it was observed by Carver [26; p. 162] that the developments of 
Abernethy and Craig are adjustments about a fixed point and that they fail to 
hold for the case of expectations of central moments if we accept the definition 


1 « 
basis = 5% Dy oles (t = 2,3, «--). 


Here ,7; represents the tth central moment of the rth grouped-distribution. The 
formula for the true value of 4:3, was supplied by Carver [26; p. 162] but he did 
not indicate a general method which might be used for the derivation of 4::;, , 
(t > 2). 

A distribution of discrete variates grouped in “‘k groupings of k”’ is a special 
case of a universe of n finite populations and hence the methods and formulas 


for the expectations of population moments are applicable to our present 
problem. 
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It is found that the adjustment formulas for moment-functions of grouped 
data involve central moments of a rectangular distribution. It will be con- 
venient for our present purposes to give a brief treatment of the moment-func- 
tions of a rectangular distribution. 


1. Moment-functions of a rectangular distribution. Consider the rectangular 
distribution of discrete variates, 


(2.1) h, 2h, 3h, --- , kh. 


It is readily shown that the moment generating function of (2.1), 


2 n 


(2.2) Ge(0) = wo + 10 + aay Hoe tang + oo 


may be written 


ehetnne sinh 3kho 
ksinh4he * 
Setting the expansion of the right member of (2.3) equal to the right member of 


(2.2) and equating coefficients of like powers of 6, we obtain the following recur- 
sion formula for the moments of (1.1) 


(1) (2) 
ae = Mna:rk — a Apn—i:r + eee 


(2.4) 


(r) 
(1 ae Pc > +++ eG, 


where yn:r represents the nth moment of a rectangular distribution. Formulas 
for un:e , (n = 0,1, --- , 10) are given below. See Sasuly [27; p. 27]. 


Mo:k = 1. 
Mize = 2(k + I)h. 
wor = 3(k + 1)(2k + 1h? = 3(2k + Whur. 
= i(k + 1) kh* = kh pice. 

3(3k° + 3k — 1)h’ poe. 

3(2k° + 2k + 1)h ws:e . 

3(3k* + 6k° — 3k + 1)h* wor. 

1(3k* + 6k° — kh’ — 4k + 2)h‘ usr. 
wee = ps(5k° + 15k° + 5k‘ — 15k° — k? + Ok — 3)h° wok. 
wen = 3(2k° + 6 +k — 8k + K+ 6B — 3)A° pe. 
More = vr (3k° + 12k" + 8k° — 18k° — 10k* + 24k° + 2k? — 15k + 5)h* une. 
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uped The deviations about the mean of (2.1) are 
fine | (28) 8 Dh, 2K 3)h, «++ A(R 8)K, AK — Ih. 
Therefore, 
gular (2.7) Bensi:k = 0. 
Jf we denote (2.6) by #, we have 
(2.8) G;(6) = eae 
The recursion formula for central moments of (2.1) is 
> 7 nr : mee re ee 
(2.9) ‘ RT (Qn + 1)? ne "he" 
2 (r+1)! 2 
— Formulas for jion:r , (n = 0,1, --- , 5) are given below. See (27; p. 27]. 
recur- jo:e = 1, 
jie = yy(k° — 1)h’, 
<—. -* Fo(3k° — 7)h’ fixe, 
lise = thy(3k* — 18k° + 31)h‘ jek, 
Rr’, jisk = shu(Bk° — 55k" + 239K — 381)A° aoe, 
‘mulas fiorr = aeire(3k* — 52k° + 410k* — 1636K" + 2555)h* yo-e- 


From the relation which connects Thiele seminvariants and the moment 
generating function, we get, see [25; p. 57], 







k+1)h 
Nor = 0, Azz = = ’ Aon+1:R = 0. 
(2.11) anal 
Nee = (— 1)" BAe — 1) eS + ore 
2n 
where Az: represents the nth Thiele seminvariant of a rectangular distribution 
of discrete variates and B,, (n = 1, 2, ---), the Bernoulli numbers: 3, gs, - - 


In each of the cases considered in this section, corresponding formulas may be 
found for a rectangular distribution of continuous variates by setting h = m/k 
(which makes the range m with k subdivisions) and then letting k — « 







_ 2. Adjustments for moments. As our basic distribution we consider the set of 
discrete variates, z;, (¢ = 1, 2,---,N), where some of the z,’s may not be 


distinct. We assume that the given distribution is grouped in “k groupings 
of il 
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When 2; is placed in the rth position of a class, the limits of the class are 


zi — (r — 1)hand z; + (k — r)hand the class mark is x; + [E-S-9) h. 


Thus, when the class mark is used as the value of z; , the quantity P a 


is added to the true value of z;. Therefore, when the expected value of a 
particular moment for “k groupings of k” is found, each variate has made a 
definite contribution as it was placed in each of the k positions of a class. 

For convenience, we define 


(2.12) e, = E el, 
2 

As was previously indicated, the expected value of a given moment involves 
the contribution of each variate as it occupies the k class positions. A con- 
venient method of finding these contributions is by means of a universe ,U, 
which is composed of the populations ,X, (r = 1,2, --- ,k). The rth population 
consists of the values of the variates when they occupy the rth position of the 
class. Hence ,X consists of 7; = 7; + e,, (@ = 1,2,---,N). 

The notation for moments is the same as that of Part I. Since ,U’y is of the 
same form as the universe studied in Part I, we use the definitions (1.1) of that 
part. 

The expected value of the tth moment is 


1] k 
fea = k Zz E(x; + e,)' 


r=) 


%()LE de) 


Many devices have been used by previous writers (24; p. 269], [25; p. 57], 

. Ia. 

[26; p. 157], to evaluate terms of the form i z. e.. However, it should he 

C r=] 

noticed that the quantities e,, (¢ = 1, 2,---,k), are respectively identical 

with the deviations (2.6) about the mean of a rectangular distribution of discrete 
variates. It follows that 


1 h 

- 8 

ha = = Z. ey. 
k r=1 

And since jias41:r = 0, we have 


{¢t/2} 
t 
(2.13) Biz, = D (, ) amie. 


s=9 


Formulas for jizs:k , (s = 9,1, --- , 5) are given by (2.10). 
If the class marks are selected as the unit of z, we set h = 1 in (2.10). If the 





d be 


tical 
crete 
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class interval is chosen as the unit of z, we set h = 1/k in (2.10). If k con- 
secutive values of the discrete variable are grouped in a frequency class of width 
m, we put h = m/k in (2.10). 

Usually we desire to estimate the value of the moments that would have been 
obtained if we had not grouped the data. Therefore (2.13) is solved for the 
moments of the ungrouped data. We have 


[¢/2) i 
(2.14) he = 7 (5) Poise. 


s=0 
wherein 
— (—1)°(2s)! p! pep ,:r Hope see pep ik 


a = 


[(2p.) ""[(2pe) 1"? -- + [(2pe) '"° wi! we! --- 


the summation being taken.for every possible product of moments for which 


v v 
» Di = §, Zz at Pp. 
i=] i=l 
Formulas, corresponding to (2.13) and (2.14), for a distribution of continuous 
variates are written by replacing the moment symbols for discrete variates by 
those for continuous variates. 


3. Adjustments for central moments. Consider the universe U which consists 
of the population ,X, (r = 1,2, --- , k), where ,X is the rth grouped-distribution. 
The expected value of the tth central moment of the k grouped-distribution is 


given by (1.3), (1.4) and (1.5) of Part I, where now y::,,-; is given by (2.13) of 
the preceding section. Thus, the development of this section is identical with 
that of section one of Part I with the single exception that u::,, = uw, no longer 
holds but is replaced by u1:,, = us + a correction. Therefore, the formulas for 
the adjustments for central moments may be obtained immediately from the 
formulas derived in section one, Part I, if the corrections of the preceding section 
are inserted. We have 


(2.15) Mizz. = fe + fe: — Mey, 

(2.16) Mazzs = Hs + Gyriley, — Siiuzuyyy + 2i:u, 

(2.17) pray = Ha + Ofteiieze + flare + 6(fi2 — Qu, + fie:n)flerns 
+ W2yriirzyyug — 12a, — 4iirr:yu5 
+ Gflai:uiwe — Siary, 


The moments of the ungrauped data can be obtained readily from formulas 
(2.15) through (2.17). 

Adjustment formulas for central moments of a distribution of continuous 
variates may be obtained from (2.13) by replacing the moment symbols for 
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discrete variates by those for continuous variates and taking the moments about 
the mean. Also, it may be observed that adjustment formulas for central] 
moments of a distribution of continuous variates may be obtained from formulas 
(1.3), (1.4) and (1.5) of Part I, provided the moment symbols are exchanged as 
indicated above and terms of the form 7,,5, ...s,:», ..»,, are set equal to zero. 


i's" 





4. Usual adjustments for Thiele seminvariants. The usual adjustments for 
Thiele seminvariants, for the univariate discrete population, may be developed 
directly by use of one of the fundamental properties of Thiele seminvariants. 

It is assumed (see (25; p. 55]) that & consecutive values of the discrete variable 
are grouped in a frequency class of width m. The k smaller intervals of width 
m/k = h go to make up the class width m, the actual points representing the k 
values of the variable being plotted at the centers of the sub-intervals. Now, 
let us suppose that each of the k consecutive boundary points of the subintervals 
is as likely to be chosen as a boundary point of the larger intervals as any other. 
Then, if x; is the class mark of the 7th frequency class, for any true value, 2, of 
the discrete variable included in this frequency class, we have 


i: = f + &; 










in which z and e, are independent variables and e, takes on the k values (2.12) 
with equal relative frequencies 1/k. 

Since we have noted that the equally likely values which e, may take on are 
deviations about the mean of a rectangular distribution of discrete variates, we 
employ the cumulative property of Thiele seminvariants [9; p. 4] and obtain 
directly 


(2.18) Nes = Ane + Ace, (t = 1, 2, ---), 














where X;:2 is the tth seminvariant computed from the grouped data, X,:z is the 
tth seminvariant computed from the ungrouped data and X,:z is defined by (2.11). 

Formulas corresponding to (2.18), for special values of t, are given by Craig 
[25; p. 57]. However, the present development indicates the dependence of 
adjustment formulas on central moments of a rectangular distribution and pro- 
vides a general formula for these adjustments which is expressed completely in 
terms of Thiele seminvariants. 


5. New adjustments for Thiele seminvariants. If we accept the definition 





1x 
Misi =, 2m, (¢ = 2,3, .--), 





then (2.18) is at best only an approximation formula. We now desire exact 
formulas for y1:,, for the case of a grouped-distribution of discrete variates. 
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First (1.9) is used and terms of the form ps,s,...s,:4,u2---», ave evaluated in terms 
of central moments by (1.3). Then terms of the form y::,, are evaluated by 
(2.13) and finally the relations between moments and Thiele seminvariants are 
employed. Exact formulas for the expected values of the second, third, and 
fourth Thiele seminvariants for grouped-distributions of discrete variables are 
given below. 


(2.19) Miry = Ae + Are — Hey, - 
(2.20) Mars = As + GArBew, — SAarwyw, + iain, - 
(2.21) way = Aa + Aare + 12[Ae — 2A + Arraliesn, 
+ 2A[frrsuiue — Asm lAr — 4Buzuyus 
+ W2jlot:uiug — Siar, — Silerys - 


Formulas for Thiele seminvariants of ungrouped data in terms of expectations 
may be obtained from (2.19) through (2.21). 

Adjustment formulas for Thiele seminvariants of a distribution of continuous 
variates are given by Langdon and Ore [23; p. 231] and Craig (25; p. 57]. If we 
denote the tth Thiele seminvariant of a distribution of continuous variates by 
L,, then 


(2.22) 2, = Li + Lr, 


where 


(= 1) Bem** 


(2.23) Lotyi:zr = 9, Ler = oT} ’ 


ie £:S io 


Formulas (2.19) through (2.21) may be used for continuous variates by 
changing the moment symbols and setting terms of the form fisys...-s.:u;,ue,---#e, 
equal to zero. 

6. Adjustment formulas applied to a numerical problem. We consider the 
arbitrary distribution given in Table ITI. 


TABLE III 


An Arbitrary Distribution of Discrete Variates 


v | a } v ¥ 

a \|— 7 1 

8 1 
a a 











5 





4 | 2% | 
4 
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The three grouped distributions, when the variates are grouped in “groupings 
of three,” appear in Table IV. 


TABLE IV 
Distributions Derived from Data of Table III by Making the Three Possible Groupings of Three 
(1) (2) 


Class f 





| 
| 
| Class Class 
1-3 20 0-2 -1tol 
4-6 3-5 2-4 
7-9 
10-12 


6-8 5-7 








9-11 8-10 





Using the fixed point 4, moment-functions are computed for the distribution of 
Table III and for each of the distributions of Table IV. These quantities 
along with the average of each moment function appear in Table V. 


TABLE V 


Moment-Functions of the Distributions of Table III and Table IV. Averages of Moment- 
Functions of Distributions of Table IV 


Ma Be = 2] Bs = As Ba Na 


1125 9819 | —17442 |238,849,317|—50, 388,966 
60 (60)? (60)° (60)* (60)* 


2511 10179 567162 (557,840,277) 247,004, 154 
(60)? (60)? (60)* (60)* 


8820 1317600 |528, 282,000) 294,904,800 
(60)? (60)* | (60)* (60)* 


9606 622440 [441 , 657,198) 163,839,996 
(60)? (60)* (60)* (60)* 





Orig. ° 7460 642400 (305,034,000) 138,079,200 
Dist. (60)? (60) (60)* (60)* 

















Table VI gives the expected values of the moment-functions as obtained by 
substituting from Table V into the formulas of sections two, three, and five. 
Also the expected values, computed from the usual formulas, are given and the 
errors which would be made, if the usual formulas were used, are indicated. 
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TABLE VI 
Expected Values of Moment-Functions Computed by Formulas 





: . Pisin = |Fizgs = 
inenianieal Fizu, | Ais = ahd 


; : 30) 441,657,198 | 163,839,996 
New Formulas | —— a 


60 (60) 


—10 416,778,000 | 133,795,200 
60 (60)* (60)* 


Usual Formulas 





960 | —24,879, 198 | — 30,060,796 
(60)¢ (60)* 





7. Evaluation of f2:,,. It appears at first that it is necessary to form the 
“k groupings of k”’ in order to evaluate the term j2:,, which enters the precise 
formula for the expected value of the variance. That was the procedure fol- 
lowed by Carver [26; p. 161]. However, it is possible to evaluate fiz:,, from the 
ungrouped data without forming a single grouped-distribution. 

By definition, 


‘ ly 
Pap, = = 2d {ata = mil, 
rs 


where ,#: is the mean of the rth grouped-distribution and y; is the mean of the 
ungrouped distribution. We wish to study the terms ,4; and ui. Consider a 
set of variates 7; , (i = 1, 2, --- ,s), with corresponding frequencies f; , (¢ = 1, 2, 

-,8). The 2’s are subject to the condition, z; — z;. = 1, and consequently 
some of the f’s may be zero. The mean of this distribution is = 

We define 

Fi = fit Seri t+ Sorrit ---, (¢ = 1,2,--- , k) 

Then, if a grouped-distribution is formed with z; in the 7th (¢ = 1, 2, --- , k) 
position of a class, the mean of this grouped-distribution is 


k 
Lo + 2» Fj€i4j-1 
af 


where e;_, = e, if e; = Landes = eife; = e,. Similarly if a grouped-distribu- 
tion is formed with z; in the (¢ + 1)st position of a class, the mean is 


Daf + x F3€i4; 
Ls 
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Thus, it is evident that, given the expression for the mean of any grouped- 
distribution in which 2; is in the 7th position of 3 class, we may form the expres- 
sion for the mean of the grouped-distribution in which z; is in the (7 + 1)st 
position of a class by a cyclic permutation of the e,’s of the given expression. 
Therefore, it follows that if we call 4, the mean of the grouped-distribution 

in which 7; is in the rth (r = 1, 2, --- , &) position of a class, then 

k 

X P1451 


i —- "1 = —e< 7 (yr = 1,2,..-, &). 
Ls 


If we define 


k 
N=2f and ¢,= Zz F j3€r4j-1 
7=1 


lc, 
> ¢:. 


Meu, = kN? — 


Thus, it is evident that j2:,, is a function of the frequencies of the variates and 
of the e;’s. The fact that the values of the variates do not enter fz:,, permits 
one to quickly calculate its value. 

Consider jiz:,, for the distribution of Table III. We find 


od: = 33e: + 13e2 + l4e;. 
Then, by successive cyclic permutations of the e,’s, 
go = 33e2 + 13e3 + 14e,, 
3 = 33e3 + 13e, + l4er. 


Substituting the values e; = 1, eg = 0, es = —1 we have ¢ = 19, ¢ = 1 and 
¢3; = —20. Therefore, 


254 

(60)? 

which is identical with the value which was found when Table V was used. 
It follows from the preceding development that 


1 k 
Key, = kN 2d ob; 


and if PF} = FPF, = --- = F, then @::,, 1s zero. 


Bez, — 


8. Conclusion. The results of this paper include: 
1. The derivation of general and specific formulas for the expected values of 
population moment-functions. 





FINITE POPULATIONS 333 


2. The derivation of generalized sampling formulas under the condition that 
samples of n are formed by selecting one variate from each population. 
3. Methods for the transformation of generalized sampling formulas into the 
corresponding infinite and finite sampling formulas. 
4. A method for the transformation of infinite sampling formulas into the 
corresponding finite sampling formulas. 
5. A demonstration of the fact that adjustment formulas for moment-function 
of grouped data involve central moments of a rectangular distribution. 
. Ageneral formula for the expected value of the ‘th moment of grouped data. 
. New adjustment formulas for central moments of grouped data. 
. New adjustment formulas for Thicle seminvariants of grouped data. 
A method for the evaluation of the term j:,, which appears in the precise 
adjustment formula for the variance. 


Many thanks are due Prof. P. 8S. Dwyer, to whom the writer is greatly in- 
debted for advice and clams. 
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THE ANALYSIS OF VARIANCE WHEN EXPERIMENTAL ERRORS 
FOLLOW THE POISSON OR BINOMIAL LAWS 


By W. G. CocHRAaNn 


1. Introduction. The use of transformations has recently been discussed by 
several writers [1], [2], [3], [4], in applying the analysis of variance to experi- 
mental data where there is reason to suspect that the experimental errors are 
not normally distributed. Two types of transformations appear to be coming 
into fairly common use: +/z and sin ~/z. The former is considered appro- 
priate where the data are small integers whose experimental errors follow the 
Poisson law, while the latter applies to fractions or percentages derived from 
the ratio of two small integers, where the experimental errors follow the binomial 
frequency distribution. In each case the object of the transformation is to put 
the data on a scale in which the experimental variance is approximately the 
same on all plots, so that all plots may be used in estimating the standard error 
of any treatment comparison. The extent to which these transformations are 
likely to succeed in so doing has been examined by Bartlett [2]. The object of 
the present paper is to discuss the theoretical basis for these transformations in 
more detail, and in particular to examine their relation to a more exact analysis. 


2. Experimentai variation of the Poisson type. The first step in an exact 
statistical analysis of the results of any field experiment, is to specify in mathe- 
matical terms (1) how the expected values on each plot are obtained in terms of 
unknown parameters representing the treatment and block (or row and column) 
effects (2) how the observed values on the plots vary about the expected values. 
In this section, the variation is assumed to follow the Poisson law. 

The specification of the expected values requires some consideration. In the 
standard theory of the analysis of variance, treatment and block (or row and 
column) effects are assumed to be additive. In the case of a Latin square, for 
example, the expected yield m; of the ith plot, which receives the ‘th treatment 
and occurs in the rth row and the cth column is written 


(1) m=G+T,.+R+C. 


where G is a parameter representing the average level of yield in the experiment, 
and T,, R, and C. represent the respective effects of the treatment, row and 
column to which the plot corresponds. Since the 7, R and C constants are 
required only to measure differences between different treatments, rows and 
columns, we may put 


(2) > T;, = DR = DC. = 0. 
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If the experimental errors are normally and independently distributed with 
equal variance, this specification leads to very simple equations of estimation 
for the unknown parameters, the maximum likelihood estimate of 7,, for 
example, being the difference between the mean yield of all plots receiving that 
treatment and the general mean. In addition to its simplicity, this type of 
prediction formula is fairly suitable for general use, because it gives a good 
approximation to most types of law which might be envisaged, provided that 
row and column differences are small in relation to the mean yield. However, 
in considering an exact analysis with Poisson variation, the prediction formula 
is assumed chosen, without reference to computational simplicity, as being the 
most suitable to describe the combined actions of treatment and soil effects. 

The probability of obtaining ‘a given set of plot yields x; with expectations m; 
may be written 


n= 


i 7;! 









Thus L, the logarithm of the likelihood, is given by 


(3) L = >> (2; log m; — m) — ps log z;! 


Hence the maximum likelihood equation of estimation for any parameter @ 
assumes the form 






(4) = (x; = m;) dm; 


2 — = 0 
mM; 00 























where the summation extends over all plots whose expectations involve 6. The 
_ OM; _.. . - - 
function 7 will usually involve a number of parameters. Since the specifica- 


tion of row, column and treatment effects in a 6 x 6 Latin square requires 16 
independent parameters, the solution of these equations may be expected to be 
laborious, though it may be shortened by the intelligent use of iterative methods. 
The problem of obtaining exact tests of significance is also difficult. The 
method of maximum likelihood provides estimates of the variances and co- 
variances of the treatment constants, which under certain conditions can be 
assumed to be normally distributed if there is sufficient replication, but this can 
hardly be considered an exact ‘‘small sample”’ solution. 

These remarks show that the exact solution is somewhat too complicated for 
frequent use. The difficulty arises principally because the typical equation of 
estimation consists of a weighted sum of the deviations of the observed from the 

: . 1 dm, 1 . ‘ 
expected values, the weights being a The factor — was introduced into 
the weight by the Poisson variation of the experimental errors, and must be 
retained in any theory which claims to apply to Poisson variation. It is, how- 
ever, worth considering whether some simplification cannot be introduced into 


t 
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the equations by assuming some particular form for the prediction formula. 
This line of approach seems promising when one considers the simplification 
introduced into the “normal theory” case by assuming the prediction formula 
to be linear. 

For Poisson variation, the linear law does not appear to be particularly suit- 
able, since it may give negative expectations on some plots (as happens in the 


. . : . . .. OM; 
numerical example considered in the next section). Further, while —— becomes 


1 ae : 
a constant, the factor — remains in the weight. 


+ 
The entire weight can be made constant by assuming a linear prediction 
formula in the square roots and transforming the data to square roots. Fora 
Latin square, this prediction formula is written 


(5) Vm =a =G+7T%.+R-+C., 


where 


(6) LT. = UR = LC. =0. 


To find the maximum value of (3) subject to the restrictions (6), we may use the 
method of undetermined multipliers, maximizing 


L+ MX T.) + wu R,) + (Qo C.). 


The equation of estimation for a typical treatment constant 7’, becomes 


x; — m;\dm; da; 4 2(x; — m;) 
8 Mo ae oe ee i, See d= 0, 
(8) ( Mm; )= oT, + "= Vm; * 


the summation being extended over all plots receiving the treatment. If 
a; = x; , then by Taylor’s theorem 

‘ 
dm; 1 od mM; 


+ (a; — a;) 9 
da? 


C <n = one . mini 
(9) rt; — m; = (a; — ai) da, * 21 


as 


If m; is reasonably large, only the first term on the right-hand side need be 
retained. When m; is small, we may use, instead of the exact square root, a 
. , 
quantity a; defined so that 


(10) rj — m; = (ai — a) —_ = 2V/m; (ai — a). 


t 


rn *o oo si _ 4. ie 
hus if the analysis is performed on the quantities a; instead of on the original 
data, equation (8) becomes 


(11) > 4(ai — a) +’ = 0. 
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On substituting the expectations for a; from (5), and using (6), we obtain 


(12) > 4a; —-G—T) +r = 0. 
7. 

The corresponding equation for G is 

(13) dD 4(a: — G) = 0, 


so that G is the general mean of the quantities a’. By adding equations (12) 
over all treatments, and comparing the total with (13), we find’ = 0 Hence 
T, is the difference between the mean yield of a’ over all plots receiving 7’, and 
the general mean of a’. In this scale the simplicity of the ‘normal theory” 
equations has apparently been recovered. Actually, the quantities:a’ are not 
known exactly, since 


baie G— 9) = Hc 4 
_ . i 2/m 2 ee 


where a is the expected value of 1/z. However, this process provides a means 
of successively approximating the maximum likelihood solution, by choosing 
first approximations to the quantities a, constructing the a’’s, solving for the 
unknown constants and hence obtaining second approximations to the expected 
values. The close relation of a’ to ~/z is seen by remembering one of the 
common rules for finding square roots. This consists in guessing an approxi- 
mate root (a), dividing x by the approximate root, and taking the mean of the 
approximate root (a) and the resulting quotient (2/a). 

The suitability of the linear prediction formula in square roots must be con- 
sidered in any example in which the above analysis is being employed. The 
law is intermediate in its effects between the linear law and the product law in 
the original data. My experience is that it is fairly satisfactory for general use, 
(ef. [2], p. 72) An exception may occur when it is desired to test the inter- 
action between two treatments, both of which produce large effects. In this 
case the definition chosen for absence of interaction may not coincide at all 
closely with the definition implied in using the linear law in square roots. An 
example of this case was given in a previous paper [1]. 

In this connection it should be noted that an approximate “goodness of fit” 
test may be obtained of the validity of the assumptions made. Since the quan- 
tities a; enter into the equations of estimation with weight 4, the quantity 
4 » (a; — a;)’ is distributed approximately as x’ with the number of degrees 

. 


of freedom in the error term of the analysis of variance. Some idea of the 
closeness of the approximation may be gathered by considering the simplest 
case in which only the mean yield is being estimated. In this case the observed 
values x are assumed to be drawn from the same Poisson distribution, and the 
sufficient statistic for the mean G is known to be 2(z;)/n. Since, however, the 
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prediction formula is here the same in square roots as in the original seale, and 
since the maximum likelihood solution is invariant to change of scale, the mean 
value a of a’ must be exactly +~/3(x)/n, as the reader may verify by working 
any particular example. Thus >4(a’ — a)” is found to be S(x — #)°/z%, the 
usual x’ test for examining whether a set of values 2 may reasonably be assumed 
to come from the same Poisson distribution. By working out the exact distri- 
bution of S(a; — #)°/% in a number of cases [5], I previously expressed the 
opinion that this quantity followed the x’ distribution sufficiently closely for 
most practical uses, even for values of the mean as low as 2. This opinion has 
since been substantiated by Sukhatme, [6] who sampled this distribution for 
m = 1, 2, 3, 4, and 5. 

A high value of x” means either that the prediction formula is not satisfactory 
or that the experimental errors are higher than the Poisson distribution indi- 
cates, or that both causes are operating. These effects can sometimes be sepa- 
rated by examining whether the observed yields deviate from the expected 
vields in a systematic or a random manner. If the deviation is systematic, the 
prediction formula is probably unsatisfactory. 

The type of approach used above resembles in many features the “exact”’ 
analysis for the probit transformation [7]. The principal difference is that in 
the case of probits the transformation is made to suit the @ priori prediction 
formula, which postulates that the probits are a linear function of the dosage, 
or of the log (dosage). Thus with probits the equations of estimation still 
involve weights in the transformed scale. These do not seriously complicate 


the analysis, since only two parameters require to be estimated for a given 
poison. With, however, the much greater number of parameters usually in- 
volved in specifying the results of a field experiment, the attractiveness of a 
solution which does not involve weighting is greatly increased. 


3. Numerical example of the square root transformation. A 5 X 5 Latin 
square experiment on the effects of different soil fumigants in controlling wire- 
worms was selected as an example. The average number of wireworms per 
plot (total of four soil samples) was just under five. Previous studies [8], [9] 
have indicated that with small numbers per sample, the distribution of numbers 
of wireworms tends to follow the Poisson law. 

The plan and yields are shown in Table I. The first two figures under the 
treatment symbols are the numbers of wireworms and their square roots respec- 
tively? the latter being regarded as first approximations to the values.a’.. Two 
of the plots receiving treatment K gave no wireworms. Since these plots are 
likely to be changed most in the transition from square roots to a’, better 
approximations were estimated for them before proceeding with the calculations. 
The best simple approximations appeared to be obtained from the square roots 
of the means in the original units. For the plot in the second row and second 
column, the square roots of the row, column and treatment means in the original 
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TABLE I 
Plan and number of wireworms per plot 
O N K 
5 
2.24 
2.25 


Le 


~ 
v 


6 
2. 
2. 
2. 


41 
49 
. 90 


1.986 .090 
.O14 2.128 
.014 2.126 


Treatment Means 


P O M N 

2.084 2.338 2.456 .920 
2.116 2.394 2.482 . O44 
2.118 2.396 2.484 2.544 


1Original numbers. *Square roots. *Second approximations. ‘Third approxima- 
tions. 














ima- 
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units are respectively 2.000, 2.145 and 1.095, and the square root of the general 
mean is 2.227. Hence 


a’ = 3[2.000 + 2.145 + 1.095 — 2(2.227)] = 0.39. 


The other zero value was similarly found to give a’ = 0.79. The corresponding 
estimates from the means of the square roots were considerably too low, since 
the a’ values tend to be higher than the square roots. The use of “missing plot” 
technique gave very poor approximations, because it ignores the fact that the 
plots in question had zero yields. 

With the estimated values inserted, the row, column, and treatment means 
of the square roots are as shown in Table I. A second approximation to a’ 
was calculated for each plot. For the plot in the first row and the first column, 
the expected yield is 








a = 1.676 + 2.460 + 2.084 — 2(2.087) = 2.046. 


Hence a’ = 3(2.046 + 3/2.046) = 1.76. These values constitute the third set 
of figuresin Table I. Theoretically, it is advisable to readjust the row, column, 
and treatment means after each new value of a’ has been obtained, in order to 
secure rapid convergence. This is rather laborious in practice, and a complete 
set of new plot values was obtained before readjusting the means. The third 
approximations obtained by this method are shown in the fourth lines in Table I 
and are correct to two decimal places. 

It is noteworthy how closely the square roots agree with the third approxi- 
mations on all plots except those which originally gave zero yields. The differ- 
ences between the second and third approximations are trivial. 

The next step is to make a x” test by means of the quantity 4Z(a’ — a)’. 
From the manner in which the values @ are constructed from the a’’s, it follows 
that Z(a’ — a)’ is simply the error sum of squares in the conventional analysis 
of variance of the values a’. The analysis of variance of the third approxi- 
mations is shown in Table IT. 




















TABLE II 
Analysis of variance of adjusted square roots 






l 
| Degrees of freedom| Sum of squares Mean square 
|. ee ; pre esta ee eee eee ee 
Rows | 4 2.9815 
Columns | 4 1.1190 
Treatments | 4 | 7.5815 1.8954 
Error 12 


4.5970 0.3831 


The value of x’ is 4 X 4.597 = 18.39, with 12 degrees of freedom, which is 
just about the 10 percent level. If the hypothesis is regarded as disproved 
only when x” exceeds the 5 percent level, the treatment means may be tested 
by regarding them as approximately normally distributed with variance 
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1/5 X 0.25 = 0.05. It is, however, more prudent to use the actual error mean 
square as an estimate of the experimental error variance, performing the usual 
tests associated with the analysis of variance. This may be justified on the 
grounds that the calculations have produced a set of plot values a’ of equal 
weight. On this basis the standard error of a treatment mean is +/0.3831/5 = 
0.2768. Treatment K reduced the number of wireworms significantly below 
all other treatments, but there is no indication of any difference between the 
other treatments. The treatment means may be reconverted to the original 
units by squaring. 


4. Experimental variation of the binomial type. In this case the yields are 
obtained by examining a constant number n units per plot and noting those 
which possess a certain attribute (e.g., plants which are diseased). Experi- 
mental variation is presumed to arise solely from the binomial variation of the 
observed fraction p possessing the attribute about the expected fraction P, which 
is specified in terms of unknown parameters representing the treatment and 
soil effects. 

If r; is the number possessing the attribute on a typical plot, so that p; = r;/n 
the likelihood function takes the form 


n! ne 

i orl(n—r)! * 

Hence the terms in the logarithm which involve the unknown parameters are 
given by 


(15) L= Zz {r; log P; + (n — 7;) log Qi}. 


qr” 
i . 


The equation of estimation for a typical constant @ is 


n oP; _ 


where the summation is over all plots whose expectations involve @. 
As in the Poisson case, an exact solution is laborious because of the weights 
nr oP; 
P:Q; 30° 
variate a; = sin’ ~/P;, and assuming that the prediction formula is linear 
in the transformed scale. Fora Latin square the prediction formula is assumed 
to be 


(17) “= G + 7 + R, + C. 


where the ith plot receives treatment ¢ and lies in the rth row and cth column. 
Further 


(18) >“ 7T. = DR, = DC. = 0. 


c 


The unequal weighting may be removed by transforming to the 
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= dP; : ; ; 
Since P; = sin’ a;, a 2\/P;Q;. A-set of variates a; is defined so that 
Oy 


on each plot 


dP; ~— ot 
(19) ne P; = (a; a ai) y = 2 P;Q; (a; — Gj. 


With these substitutions, the equation of estimation for 7',, for instance, 
becomes 


(20) 2» 4n(a; — ai) +’ = 0 


where, as before, \ is an undetermined multiplier. The remainder of the solu- 
tion proceeds exactly as in the Poisson case, 7’, being found to be the difference 
between the mean value of a; over all plots receiving this treatment and the 
general mean of a;. A x’ test may be made with a 4n(a; — ai)’. 

+ 


From (19) 


. 1 1 
(21) a: = a: + 9 7p og, Mi — Pi) = 1 + 7p (Qi — 


(22) = a; + 3 cot a; — gq; cosec (2a,) 


where q; is the observed fraction which does not possess the attribute. The 
calculation of approximations to a; thus involves finding a predicted value a; 
from the treatment and block (or row and column) means, and using equation 
(22). Tables [10] of the values of sin’ +/P;, a; + } cot a; , and cosec (2a;) 
have been prepared to facilitate the computations. It should be noted that 
these tables are in degrees, whereas the above equations assume that a; is 
measured in radians. In degrees, equation (20) above becomes 


(23) X S05 (a; —a) =0 


while 


(24) a; = a + = {3 cot a; — q; cosec (2a;)}. 


As in the Poisson case, the appropriateness of the linearly additive law in 
equivalent angles depends on the way in which treatment and soil effects operate. 
As Bliss has shown [11], the effect of the transformation is to flatten out the 
cumulative normal frequency distribution, extending the range ove~ -vhich it 
can be approximated by a straight line. 


5. Numerical example of the angular transformation. The data were selected 
from a randomized blocks experiment by Carruth [12] on the control by me- 
chanical and insecticidal methods of damage due to corn ear worm larvae. 
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The control and the six types of mechanical protection were chosen for analysis, 
the ‘‘yields” being the percentages of ears unfit for sale. The numbers of ears 
varied somewhat from plot to plot, the average being 36.5, but the variations 
were fairly small and appeared to be random. It was considered that varia- 
tions in the weight (4n) could be ignored in solving the equations of estimation. 


TABLE III 


Percentages of unfit ears of corn 
Treatments Blocks 
z Ill IV V 
34.3 24.1 39.5 55.5 ‘ 
35.8 29.4 38.9 48.2 ; 39.57? 
36.0 29.4 38.9 48.6 . 39 . 708 


Means 


15.1 11.8 9.4 31.7 : 
22.9 20.1 17.9 34.3 ‘ 24.62 
23.1 20. 18.2 34. . 24.75 


33. 5. 26.3 30. 
35. 12. 30. 33. 
35. ‘ 31. 33. 


13. . 16. 39. 
21. \ 24. 38. 
21. ‘ 24. 39. 


29. , 21. 30. 
32. ‘ 27. 33. 
32. 28. 33. 


21. : 16. 13. 
27. ‘ 23. 21. 
28. ; 24. 22. 


16. 19.3 ‘ 2. 11. 
24.0 26.1 ‘ 8. 19. 
24.3 26.2 28. 10.§ 20. 


Means 26.81? 28.87 18.44 24. 32.79 


1 Percentage. * Equivalent angle. *Second approximation. 


The percentages of unfit ears, the equivalent angles and the second approxi- 
mations to a’ are shown in descending order in Table III. The percentages on 
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individual plots vary from 2.1 to 55.5. The second approximations were calcu- 
lated from the block and treatment means of the angles. For the control plot 
(treatment 1) in block I, for example, the expected value is 


39.57 + 26.81 — 26.31 = 40.07. 


Since Fisher and Yates’s tables of a + } cot a and cosec (2a) are given for 
values of a from 45° to 90°, we take the complement of the expected value, 
which is 49.93. Interpolating mentally from the table, we find 


a + } cot a = 74.0, cosec (2a) = 58.3. 
Thus the second approximation to the complement of the angle is 
74.0 — 0.424 X 58.3 = 49.3. 


Hence the second approximation to a’ is 40.7, which agrees very closely with 
the equivalent angle. 

On the majority of the plots, the second approximation differs by only a 
trivial amount from the equivalent angle. The plots with the three lowest 
percentages (2.1, 2.5, and 5.0) have increased somewhat more, and also one or 
two other plots where the angles deviated considerably from the expected values. 
A third set of approximations was not considered necessary. 

The analysis of variance of the second approximations is given in Table IV. 


TABLE IV 


— of freedom| Sum of squares Mean squares 











.  ibieleieliee, 
Blocks 5 709.79 7 
| 


Treatments 6 1,531.56 255.26 
Error 30 982.67 32.76 


Taking 7 as 36.5, the expected value of the error mean square is 820.7/36.5 = 
22.48. Thus x’ = 982.67/22.48 = 43.71, with 30 degrees of freedom, which is 
almost exactly at the 5 percent level. This, together with the appreciable 
amount of the variance removed by blocks, indicates that the experimental 
error probably contains some element other than binomial variation. As in the 
preceding case, it would be wise to make the usual analysis of variance tests 
with the actual error mean square. 


6. Discussion. It must be emphasized that the solutions given above apply 
to the case where the whole of the experimental error variation is of the Poisson 
or binomial type. The methods are therefore likely to be useful in practice only 
where the experimental conditions have been carefully controlled, or where the 
data are derived from such small numbers that the Poisson or binomial variation 
is much larger than any extraneous variation. The x’ test is helpful in deciding 
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whether this assumption is justified. Further, the examples worked above 
indicate that the transformed values form very good approximations on most 
plots. It will often be sufficient to adjust only those plots which give zero or 
very small values in the Poisson case, or zero or 100 percent values in the 
binomial case. In this connection the method of adjustment given above may 
perhaps be considered as an improvement on the empirical rule given by Bartlett 
[13] of counting n out of n as (n — 1/4) out of n. 

Where extraneous variation becomes important, as is probably the normal 
case with data derived from field experiments, there seem to be no theoretical 
grounds for using the adjusted values. If we were prepared to describe accu- 
rately the nature of the variation other than that of the Poisson or binomial 
type, a new set of maximum likelihood equations could be developed. These 
would, however, lead to a different type of adjustment. 

The justification for the use of transformations has no direct relation to the 
Poisson or binomial laws in this case, or in cases where percentages are derived 
from the ratios of two weights or volumes, as in chemical analyses, or from an 
arbitrary observational scoring With percentages, for example, it may be 
said, without describing the experimental variation in detail, that the variance 
must vanish at zero and 100 percent and is likely to be greatest in the middle. 
The formula V = APQ is at least a first approximation to this situation. The 
angular transformation will approximately equalize a distribution of variances 
of this type, provided that d is sufficiently small. We have, of course, returned 
to an “approximate” type of argument. It follows that the original data should 


be scrutinized carefully before deciding that a transformation is necessary and 
that any presumed opinions about the nature of the experimental variation 
should be verified as far as possible. 


7. Summary. This paper discusses the theoretical basis for the use of the 
square root and inverse sine transformations in analyzing data whose experi- 
mental errors follow the Poisson and binomial frequency laws respectively. 

The maximum likelihood equations of estimation are developed for each case, 
but are in general too complicated for frequent use. If, however, the expected 
yield of any plot is assumed to be an additive function of the treatment and 
soil effects in the transformed scale, a transformation can be found so that the 
equations of estimation assume the simple “normal theory” form. The trans- 
forms are closely related to the square roots and inverse sines respectively. 

The nature of the assumed formula for the expected values is briefly discussed, 
and a x’ test is developed for the combined hypotheses that the prediction 
formula is satisfactory and that the experimental errors follow the assumed law. 

Numerical examples are worked for both types of transformation. These 
indicate that even for data derived from small numbers, the square roots or 
inverse sines are good estimates of the correct transforms on almost al! plots, 
except those which give zero yields in the Poisson case, or percentages near 
zero or 100 in the binomial case. 
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In practice, these new methods are not recommended to supplant the simple 
transformations for gencral use, because it can seldom be assumed that the 
whole of the experimental error variation follows the Poisson or binomial laws. 
The more exact analysis may, however, be useful (2) for cases in which the plot 
yields are very small integers or the ratios of very small integers (i7) in showing 
how to give proper weight to an occasional zero plot yield. 
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This section 1s devoted to brief research and expository articles, notes on methodology 
and other short items. 
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ORTHOGONAL POLYNOMIALS APPLIED TO LEAST SQUARE FITTING 
OF WEIGHTED OBSERVATIONS 


By Braprorp F. KIMBALL 


1. Introduction. Let the independent variable be denoted by z, and let it 
range over n consecutive integral values 2, to z,. Thus z represents the 
index-number of the ordered intervals at which observations are taken, where 
the intervals are all of equal length, and an index-number is assigned in con- 
secutive order to every interval within the range of investigation, whether ob- 
servations occur in that interval or not. Let y, denote the observation measure 
(usually referred to as observed value), if such observation exists. Let w, denote 
the weight of that observation, with weight zero assigned where observations 
are lacking. 

To shorten the notation, summation over all values of x from 2; to x, will be 
denoted by the sign 2. If a subscript and superscript is used, the context will 
indicate the variable to which the summation refers. The rth binomial coeffi- 


cient will be denoted by (7), 


A system of polynomials ¢,(x), r = 0, 1, 2, 3, --- of degree r in z is said to be 
an orthogonal system, for the purposes of this paper, if they satisfy the relations 


=0, r# 
(1) E W.6-(2)$.(2) ani 








~ 0, r= 8. 
To construct the polynomials, one may write them in the form 


do(x) = fo(x) = constant 


2 r—l 
( $(x) — f(x) ae Xu hoz) os 1, 2, 3, pe 


where the h; are constants and the f,(z) are arbitrary polynomials of degree r. 
It then follows from the conditions of orthogonality that 


pi w2f,(x)o(x) 
3 hy = Se alAt)ONT) | 
” L, we [gi(x)} 


348 


ORTHOGONAL POLYNOMIALS 349 


Thus when the polynomials f,(z) have been chosen for all r, the system of 
orthogonal polynomials for a given set of weights can be constructed and is 
uniquely determined except for a constant factor [1]. 


By virtue of the relation (2) and the conditions of orthogonality (1), it follows 
that 


(4) Dw.l¢,(x)} -_ Lwf-(x)o,(z). 
Define the function ®(r, k) by 


(5) P(r, k) = Dwf-(x)ox(2), 
It follows from the relations (2) and (3) that 


_ _— Feri). 
(6) (x) = f(x) a 50) (x) 


where it is to be noted that this summation is independent of z. 
Define gq, and Y, by 


7) o> Dw.l¢,(x)]° — Lw2f-(x)o,-(x) _ P(r, r), 
(8) Y, = Zwzy,-(2). 


Then if u,(x) represents the polynomial solution of degree r of the normal equa- 
tions set up for observed values y, and weights wz , 


Yo 


(9) u(z) = <2 + Z 63(z) ues ,...,.+ Zee 
1 


qe qr 
If E’ denotes the weighted sum of the squares of the discrepancies between 
the ordinates u,(x) of the fitted curve and the observed values y, , then [2], 


r ¥% 
(10) EB’ = Do w.lu(z) — yl = Do wey? — x — 
The practicability of the use of orthogonal polynomials is thus seen to depend 


upon whether the quantities @(r, k) and Y, can be evaluated in a reasonably 
simple manner. 


The thesis of this paper is that if f,(2) is taken as the binomial coefficient (7), 


one can effectively apply the method of orthogonal polynomials. This is made 
possible by the use of factorial moments in conjunction with an adding machine 
that prints cumulative totals. 

In treating the same problem Aitken sets up the normal equations in terms 
of factorials, but considers the explicit use of orthogonal polynomials imprac- 
tical. He writes: “the arbitrary nature of the weights stands in the way of 
any analytical sophistication; orthogonal polynomials emerge, but are not of 
great use; and the necessity of solving the moment equations cannot be circum- 
vented” [3]. He prefers a determinantal method of solution of the normal 
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equations which the writer has found to be more involved from a practical point 
of view, than the present method, although it is elegant from a theoretical 
standpoint. 

Thus although the present method is not new from the point of view of 
theory, the writer has found that forms made up by the use of the technique 
suggested below, offer an effective method for fitting polynomial curves to 
weighted observations. 


2. Simplification of the problem when f,(z) = (7), Factorial moments S, 
and M, are defined by 


(11) 8, =2(7)w., M, = 2 (7) way r = QO, 1, 2, -.. 


These moments are not difficult to compute and are readily checked as com- 
puted. Formula for #(r, k) then becomes 


(12) o(r,k) = = (?) Wzo,(Z). 
Thus since ¢o(z) = 1, (7, 0) = C) w, = S, and hence 


_ (2 _ €(1,0) _ S1 
(2) = (;) 50,0) ~~ & 


wen =2()n(e-B)=2(Y) 220) 


= (r + 1)S41 + rS, — SiS, 
So 


Again 


a = #(1, 1) = 2% +(1 _ *) 5, 


A recursion formula for ®(r, k), may be obtained by expanding ¢;(x) in formula 
(12) by means of (6). Thus 


&(r,k) = Do (7\(z) ws — 2 wt |= (7) weds(2) | 


-£(\G)n- Bees 


The first term can be easily expressed as a linear combination of binomial coeffi- 
cients, and thus as a linear combination of moments S;. 


(13) 
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The formula for Y, can be broken down as follows: 
Yo = Zz. Wszyz2 = Mo, 


(14) Ve = >» WeYzor(X) = Z WzYz (7) = x BoD [> WeYzo(2)] 


Ya = Mr - *2 Vy, 2.0 y,, etc. 
0 


qi 


3. General technique of computation. In determining the best fitting poly- 
nomial of degree r, the ratios ®(r, 7)/g; are seen to play an important part. 
In a form for calculation, these quantities should receive simple designations 
such as b; for a second degree curve, c; for a third degree curve, etc. Suppose 
they are designated by R; for a curve of degree 7; then 


(15) (a) = (7) - F Rate 


r—1 


(16) Y, = M, — > RY: 
t=0 


7 7) 


(17) q =), (7) w. - ¥ R;®(r, ¢) 


and in determining ®(r, k) for k = 0, 1, 2,---r — 1, formula (13) may be 
written : 


(18) wr, = 5 (2)(j)we — Z Rae, od. 


The fact that these quantities R; appear as multipliers in so many of the 
fundamental formulas greatly simplifies the mechanics of the calculation, espe- 
cially when a calculating machine is used. 

In final determination of polynomial curve the differences of the polynomial 
at x = O are readily determined since the leading term of each orthogonal 
polynomial is a binomial coefficient and thus 


A*¢(0) = — > R;A* ¢,(0), 


(19) k =1,2,3,---,r—1 


A’ $-(0) = 1, 
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Since the effectiveness of the method depends upon the availability of an 
adding machine which records a cumulative subtotal, the determination of the 
curve from the differences at the point » = 0 is not a hardship and indeed 
affords a quick and accurate means of setting up the curve for purposes of 
plotting and checking. 


uf0) = +2 + S1g.0) + 2240) +, .--, + 6,00), 
qo qi ge qr 


(20) A* u,(0) = Xt + Vest A* Pr+1 + ete + L At ¢,(0), 
Yk Qk+1 qr 
A’u,(0) = Ee. 
Qr 


The advantage of the use of orthogonal polynomials becomes particularly 
apparent when error formulae are to be used. The formula for the sum of the 
squares of the discrepancies, denoted by E’, is given above (formula (10)). 
The estimated variance V of the weighted observations about the fitted curve 
is thus E’/(n — r — 1) where n is the number of values of x used in fitting 
and r is the degree of the curve fitted. Recalling that the matrix of the normal 
equations is of the diagonal form with diagonal elements qo, 41, --- , gr. it 
follows that the coefficient Y;./q, of ¢.(x) in the expansion of u,(x) has the 
variance V/q . 

Furthermore the variance of the ordinate of the fitted curve u,(x) at a point z 
due to sampling variations in the determination of the coefficients of the curve, 
under the assumption that the weights and values of the independent variable z 
do not involve errors, has the simple form 


Variance of u,(z) 
(21) at point « = v| 


#) HO 4, #2] 
qo qu Qr 


since the covariances of the orthogonal polynomials are zero [4]. 
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COMBINATORIAL FORMULAS FOR THE ;th STANDARD MOMENT 
OF THE SAMPLE SUM, OF THE SAMPLE MEAN, 
AND OF THE NORMAL CURVE 


By P. 8. Dwyer 


The standard moments of the normal curve are usually expressed by the two 
statements [1, p. 97] 


_ (2s)! 
(1) _ 2 s! 


Q2e41 = 0 


It is of some interest to note that these two statements may be generalized into 
(2s)! 
2*s! 
things can be grouped in pairs and that 0 is the number of ways in which 2s + 1 
things can be grouped in pairs. It is obvious that an odd number of things 
can not be grouped in pairs since there must be at least one unpaired unit. It 
is clear, too, that the number of orders in which 2s things can be grouped in 


. . (2s\(2s — 2\/2s — 4 4\/2 . . (2s)! ; . 
pairs is (3) 9 \( 9 )-- (5)(3) and this is “—_ However if the 


resulting paired groups (rather than the orders of grouping) are counted it is 


a single statement by observing that is the number of ways in which 2s 


' 
seen that each paired grouping is repeated s! times so that ey! represents the 


2*s! 
number of ways 2s things can be grouped in pairs. If we arbitrarily define the 
number of ways 0 things can be grouped in pairs to be 1 (or if we limit our 
theorem to values of r > 0) we may say “The rth standard moment of the 
normal curve is equal to the number of ways in which r things can be grouped 
in pairs.” 

As presented above the combination representation is used primarily as a 
means of unification of results. However, it is possible to derive the standard 
moments of the normal curve in such a way as to indicate the term = early 
in the proof and to trace it throughout the proof. I follow the method outlined 
by H. C. Carver [2] in obtaining the normal distribution as the limit of the 
distribution of sample sums (or of sample means) though I use a somewhat 


r 


different notation [3, p. 5]. If we let as ' ol represent the number of 
ff .... gf 


ways in which r units can be collected with x, groups containing p; units, m2 


groups containing pe units, etc., then the multinomial theorem can be expressed 
as [3, p. 17] 


@) 9" = (yen gps) OF = we 
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where the surmmation is taken over all possible partitions pr' --- pI* of r and 
the expression (pi' --- p,") represents the power product form [8, p. 14] which 
is m,!x! --- ,! times the monomial symmetric function. If p represents the 
number of parts of the partition then 


p=mt+mt+-:--- +, 










while 


r= pm + Powe + --- + Pete. 





Now it can be shown from (2) in the case of infinite sampling that 
1" Y x 
(3) Bra = x ( ¥1 ll n” (iip,) tee (ip,) , 
e Pi eee Ds 


and since @, = 0, it is only necessary to sum over all partitions which have no 
unit part. We have then, dividing by [fe:)]” = [njel’” 


y n'® 2 “ 
(4) te) = is * _ aa nt (ap,) Ppa (ap,) me 
1 8 


We have now a formula for the rth standard moment of the sample sum which 





is expressed essentially in combination notation since the quantity ( ' wl 


1 
‘—_e 
represents the number of ways in which r units can be grouped to form 7, 


groups containing p; units, 72 groups containing pe units, etc. All non-unitary 
groupings of r are formed, each combinatorial coefficient is computed and multi- 
plied by n”/ n*” times the product of the corresponding a’s, and the sums are 
formed. It might be noted that the formula for the rth standard moment of 
the sample mean is identical with (4) while the corresponding finite sampling 
(without replacements) formula is 


E ° Ftc e 
(5) ir) = ie ” * : “a See (ay,)"" “— (ap,)"*. 
The P’s are defined in previous papers [2, p. 105-6][3, p. 113). 

We obtain the formula for the rth standard moment of the normal curve by 
taking the limit of (4) as n — ©. (H.C. Carver has pointed -out [2, p. 121] 
that this method of derivation imposes fewer restrictions than does the deriva- 
tion from Hagen’s hypothesis.) Each partition term will approach zero as n 
approaches infinity if p < 3r. Now the only non-unitary partition in which 
p is not less than }r is the partition 2” and we can have this partition only when 
ris even. Now the limit as n approaches infinity of n®/n™ is unity and we 
have, in the limiting case 


( ) if r is even 
(6) a, = { \2" 
0 if ris odd. 
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Since 3) is the number of ways r units can be grouped in pairs when r is 


even and since 0 is the number of ways r units can be grouped in pairs where 
r is odd, it follows that the rth standard moment of the normal curve is the 
number of ways in which r units can be grouped in pairs. 

This development is of interest in that it makes possible the tracing of the 


value (3) back through the various stages of the development to the coefficient 
of (2"”) in the power product expansion of the multinomial theorem. 
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ON A METHOD OF SAMPLING’ 
By E. G. Ops 


It is recorded that Diogenes fared forth with a lantern in his search for an 
honest man. History does not tell us how many dishonest men he encountered 
before he found the first honest one but, judging from the fact that he took his 
lantern, apparently he expected to have a long search. The general problem of 
sampling inspection, of which the above is a special case, can be stated as follows: 

Given a lot, of size m, containing s items of a specified kind. If items are 
to be drawn without replacement until 7 of the s items have been drawn, how 
many drawings, on the average, will be necessary? 

Uspensky” has solved a problem concerning balls in an urn, from which the 
answer to the above question can be obtained for the special case i = 1. For 
the general case, the distribution for the number n of the drawing in which the 
ith specified item appears, is given by terms of the series: 


m—sti C 


(1) vs ia Zz ciel eeenmed - > Cr-1,5-1Crm—n,e—i 


n=1 1 n=0 Cas ; 


1 Presented to The Institute of Mathematical Statistics, Dec. 27, 1938, at Detroit, Mich., 
as part of a paper, entitled ‘“Remarks on two methods of sampling inspection.” 


?J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, New York, 
1937, p. 178. : 
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where the first symbol indicates the number of ways of choosing 7 — 1 of the 
specified items to fill the first n — 1 places, the second symbol indicates the 
number of ways of disposing of s — | specified items in the last m — n places, 
and the denominator gives the number of ways that the s items can bescattereq 
through the lot. In order to get the average number of draws we multiply 
vo by n and sum. Then we have 


, > NC n-1,1-1 C'm—n,s—1 al i(m + 1) . Cn iCm—ns—1 a(m + 1) 


a” & Cue ~  8tl 26 Coster s+1- 

Example 1. On a table of 200 bargain shirts there are 5 which have a 15 in, 
neckband and 35 in. sleeves. How many shirts must be examined, on the 
average, to find two of the desired kind? 

Solution. For this case, m = 100, s = 5,7 = 2. Therefore % = [2(201)] + 
6 = 67. Thus, an average of 67 shirts must be examined. 

Suppose ux represents the Kth moment about the mean, vx the Kth moment 
about the origin, and »x the moment relation given by 


(3) ve = (1 + K — 1), 
where (v; + K — 1)? represents the result of expanding (v + K — 1) and 


changing the exponent of v to the corresponding subscript. (For example, 
v3 = (1. + 2) = v3 + 3. + 2.) It is easy to derive the recurrence relation 


K ——_— +K il 
From this result the computation of the moments about the mean is theoretically 
direct. Actually the results do not seem to be very compact. The variance is 
given by 
(5) — (m + 1)(m — s) 
(s + 1)*(s + 2) 
In case s is unknown and n is known for a particular value of 7, we may 


1 , : a(m + 1) 
3 = ——__-, Th 
e411)" by using the relation, n <7 en 


1 n 
(6) s+1 est. —_ i(m + 1)’ 


and the variance, using this estimate, is given by 


: 1 - n 1 n _ ne 
(7) Variance of (4 i ect. = P| i|fi = 


Example 2. In order to check a box of 144 screws, screws are drawn until 
10 good screws are obtained. In a particular case only 10 drawings were neces 
sary. Estimate the number of good screws in the lot. 

Solution. Here m = 144,71 = 10, n = 10. The estimate for s is obtained 


[i(s + 1) — 7’). 


estimate s, (or rather 








nent 


and 
nple, 
ation 


cally 
1ce is 
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1 10 1 : . 
from Eis) est. = 10(145) 145 and, as might be expected, the conclusion 


is that all the screws are good. Furthermore the variance of the estimated 
quantity is zero. 

It is obvious that the number of draws necessary to obtain any particular 
number of specified items is correlated with the numbers of draws for lesser 
numbers of items. To investigate this, let us suppose that n; represents the 
number of draws to obtain exactly 7 specified items and that 7; = n; — nj. 
It follows immediately from our previous results, that 


(8) E(a:) = E(m) = E(x) = ++. = a 


This result could be obtained from the fact that, corresponding to any arrange- 
ment of the lot for which z = a and x = b, there is another arrangement 
where 72 = 6 and x = a, formed by moving a — b of the non-specified items 
from the first group to the second. From this fact we see, also, that 


(9) E(2i) = E(a3) = E(23) =.---. 


_ (m + 1)(m — 8) 
(s + 1)*%(s + 2) 


2 
But 2; = m and on, 


[is +1 — 1] = ds. 
Therefore, 
(10) o:, = 02, = ++» = ds. 
But, from our previous formula we have 

On. = d(2s — 2), o,, = d(3s — 6), ete. 
Since m2 = 2 + 22, it follows that 

On, = Oe, + Wx, ,2,52,F2, + Ory 

where rz,,2, is the correlation between x, and z2. Therefore, 
(11) T2,,2, = —1/s. 


Also, since 21; = ne — 22, it follows that 


s—l 
(12) iia 4/ as. 


Likewise, from zz = ne — 2), we get 


(13) — V* = 1 


Finally, we obtain the three general results 


14 eli eli 
( ) Tas titn a 
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(15) Ta;.2; = / _- t+) 
St : 
=4 f___i(s—%) 
- tuum = MW EPG — TF 1) 


Example 3. The cards of a deck are turned one by one until two aces have 
appeared. The second ace appears when the 36th card is turned. How many 
more cards should one expect to have to turn to find a third ace? 































—— a kn ae _ _v6 
Then viz = 2. go ==, and r,,,:; = V 44-241) 6. Also 
oz, = V4d and op, = VW 6d. Since 2-3 « Taste (me — fa) we have 

Oz; Tne 
13 = 53 _ 2, -V8(36 = ie) a=. 
5 ~~ /6 6 5 3 


Of course this result could have been obtained more directly by noting that 
there were two aces left among the 16 remaining cards. 


Conclusion. The results given in this note might be useful when it is neces- 
sary to estimate the number of items to be drawn in order to secure a desired 
number of a particular type, such as may be the case in obtaining a sample 
with previously defined characteristics. Also the note disproves such intuitive 
notions as the one that when luoking for a desired record, one is most likely to 
have to search the whole pile to find it. As far as methods of sampling inspec- 
tion are concerned, the one implied in this note has little to recommend it. 


CaRNEGIE INSTITUTE OF TECHNOLOGY, 
PitTsBuRGH, Pa. 


RANK CORRELATION WHEN THERE ARE EQUAL VARIATES' 






By Max A. WoopsurRyY 


If there is given a set of number pairs 
(1) (Xi, Yi), (X2, Ya), --- , (Xw, Yw), 


we may assign to each variate its ‘rank’ (i.e. one more than the number of 
corresponding variates in the set greater than the given variate). In this way 
there is obtained a set of pairs of ranks 






(2) (xy ’ Yi), (22 ’ ye); Te (tw ’ Yn). 











1 Presented at the fall meeting, Mich. section of the Math. Assn. of America, Nov. 18, 
1939, Kalamazoo College. 
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If we assume that X; # X; and Y; ¥ Y; when 7 ¥ j then it follows that 
each integer from 1 to N appears once and only once in the z’s and the same 
holds for the y’s. This leads at once to the formulas: 


(3a) a= Du i= NN +1)/2, 


i=l 


N N N 
(3b) Lei= Lyi = LP=NN +N +1)/6. 
When these results are substituted in the expression for the product moment 
correlation coefficient we have after simplifying [1], 
N 
(4) p=il1 — 6 2 Di/N(N* - 1) where D; = x; — yi. 

If we consider the case of equal variates and follow the rule for assigning 
ranks given in the first paragraph, the resulting method is known as the bracket- 
rank method. The use of (4) in the calculation of p by this method is not 
strictly valid, because not every integer appears in the summations and so 
neither (3a) nor (3b) is true. 

The more accurate mid-rank method assigns to each of the equal variates 
the average of the ranks that would be assigned if we were to give them an 
arbitrary order. This method preserves (3a) but not (3b). In this paper py 
indicates the value of p as calculated by (4) when the mid-rank method is used. 

In a method due to DuBois [2], the equal variates are assigned the same rank 
so as to satisfy (3b). In this case (3a) is not satisfied. 

If we assign the ranks to the equal variates in an arbitrary way, then (3a) 
and (3b) are of course satisfied and the use of (4) is valid. There are two 
disadvantages to such a method; first, the equal variates are treated differently, 
and second, the assignment of ranks is arbitrary. These difficulties are removed 
if one uses the average of the values of p corresponding to all possible ways of 
arbitrarily assigning ranks to the equal variates. Since p is linear in ee Di the 


average value of p may be obtained from the average value of Za Dj and the use 


of (4). 

Let us first consider the simple case of two equal variates in one of the vari- 
ables, say X. It is clear that there are only two possible ways of assigning 
ranks, and that if we arrange the series,by the assigned x ranks, the resulting 
series differ only in the y ranks corresponding to the equal X variates. If we 
denote the two x ranks to be assigned by m and m + 1 and the y’scorresponding 
for a particular arrangement by ym and ym41 we have for the average p Dj the 


expression 
m—l1 N 
L(t—y)+ DL (e«—y.) 
(5a) z=1 z=m+2 


+ AL(m — ym)? + (m + 1 — yongs)® + (om — Yngs)® + (om + 1 — ym)’ 
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By the mid-rank method the corresponding expression is 

m—1 N 
(5b) L (x — yz) + 2d (x — ys)? + (m+ 3 — ym)? + (m + 3 — Yogi)’. 
The correction A, to be added to the mid-rank Zz Dj to get the average Zz Diis, 


by subtracting (5b) from (5a) and simplifying, 


(6) A: = 


vie 
. 


To get Ax in the more general case of several equal variates, we need only con- 
sider the difference between the average value of ze Dj and that obtained by the 


mid-rank method. If there are K equal X variates we may assign the ranks 

in K! ways, this results in K! permutations of the y ranks for the sets arranged 

in order of their assigned z ranks. In (K — 1)! permutations y,4; corresponds 
N 


to the x rank of m + iso that the correction to the mid-rank }> D? is 


t=1 


Ax = =) (= > (m+ i— Yn+i) }- :  (m + -— = tm) 
SS i? > [om +i— Ym+i)” - (m + — _ mi) |= = K(K* — 1) 


K j=0 i=0 12 


(7) 


It is to be noticed that the correction is positive and depends only on the number 
of equal X variates. From this it can be concluded that for more than one 
group of equal variates no matter whether X’s or Y’s we can obtain the average 
> D by computing a correction for each group and then adding these correc- 


t 


tions to get the total correction to the mid-rank Le D;. Then as before noted 


we can by (4) calculate the average p (denoted as )). 


rim: ° 2 . . r . 
rhis correction to Zz Dj may be converted into a correction to py. That is 


+ 


‘ Gan;  _ K;(Ki — 1) 
if On,x;, = N(N? — 1) a, 1) ON(N? — 1) ’ then 
(8) p = pu — D by.K; ’ 


where the summation extends over all groups of equal variates, and K; is the 
number of equal variates in the 7th group. 

A table of 5yx for different values of N and K is given, and also a table of 
Ax. The values Ax are given in the top row of the table, while the yx are 
given in the rows below. 








si 


\- 


\e \w ow 
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Table of Ax and iyx 

\K 
\ 2 3 4 5 6 7 8 9 10 11 12 18 
N 

Ax 0.5000 2.000 5 10 17.5 28 42 60 82.5 110 143 182 
OnK 
a a ee eee ee 
4| @em=-—- - <-§ = & & ee -— —_— = 
‘5 i @@awewame-—- - -« —«— — — -— =— «= 
6 aon @iwmemm— —- =—- &— =e — — = 
7 om ey Gms 108 BO -—  —- — — —_— — = 
8  -_ sf oe). ee Oe ee eee eee 
9 0042 0166 0417 0833 1458 2333 350 — — — — — 
10 0030 0121 0303 0606 1061 1697 2546 3686 — — — — 
11 0023 0091 0227 0455 0795 1273 1909 2727 3750 — — — 
12 0017 0070 0175 0350 0612 0979 1469 2098 2885 3836 — — 
13 0014 0055 0137 0275 0480 0769 1154 1648 2266 3022 3929 — 
14 0011 0044 0110 0220 0385 0615 0923 1319 1813 2418 3143 4000 
15 0009 0036 0089 0179 0313 0500 0750 1071 1473 1964 2554 3250 
16 0007 0029 0074 0147 0257 0412 0618 0882 1213 1618 2103 2676 
17 0006 0025 0061 0123 0214 0343 0515 0735 1011 1348 1752 2230 
18 0005 0021 0052 0103 0181 0289 0433 0619 0851 1135 1476 1878 
19 0004 0018 0044 0088 0154 0246 0368 0526 0724 0965 1254 1596 
20 0004 0015 0038 0075 0132 0211 0316 0451 0620 0827 1075 1368 
21 0003 0013 0032 0065 0114 0182 0273 0390 0536 0714 0929 1182 
22 0003 0011 0028 0056 0099 0158 0237 0339 0466 0621 0807 1028 
23 0002 0010 0025 0049 0086 0138 0208 0296 0408 0543 0708 0899 
24 0002 0009 0022 0043 0076 0122 0183 0261 0359 0478 0622 0791 
25 0002 0008 0019 0038 0067 0108 0162 0231 0317 0423 0550 0700 
26 0002 0007 0017 0034 0060 0096 0144 0205 0282 0376 0489 0622 
27 0002 0006 0015 0031 0053 0085 0128 0183 0252 0336 0437 0556 
28 0001 0005 0014 0027 0048 0077 0115 0164 0226 0301 0391 0498 
29 0001 0005 0012 0025 0043 0069 0103 0148 0203 0271 0352 0448 
30 0001 0004 0011 0022 0039 0062 0093 0133 0184 0245 0318 0405 
35 0001 0003 0007 0014 0025 0039 0059 0084 0116 0154 0200 0255 
40 0000 0002 0005 0009 0016 0026 0039 0056 0077 0103 0134 0171 
45 0000 0001 0003 0007 0012 0018 0028 0040 0054 0072 0094 0120 
50 0000 0001 0002 0004 0007 0011 0016 0023 0032 0043 0055 0070 
60 0000 0001 0001 0003 0005 0008 0012 0017 0023 0031 0040 0051 
70 0000 0000 0001 0002 0003 0005 0007 0010 0014 0019 0025 0032 
80 0000 0000 0001 0001 0002 0003 0005 0007 0010 0013 0017 0021 
90 0000 0000 0000 0001 0001 0002 0003 0005 0007 0009 0012 0015 
100 0000 0900 0000 0000 0001 0002 0003 0004 0005 0007 0009 0011 
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As an example of the use of the table we will consider the following problem, 
[2, p. 56], with the ranks assigned as for the mid-rank method. 





Subject I : Il For the mid-rank method we have 
——— $$ 14 
A 1 2.5 Dd, Di = 119.5, N = 14, 
B 4 10 7 
C 4 2.5 6(119.5) “ane 
=1-——~ = 0. 
D 4 5 ” nm — 9 
E 4 7 Referring to the table we find that 
F 4 2.5 
: 7 s : — 7 
H 8 2.5 Ky ane . ONK; —_ 
I ae 6 2 0.5 0.0011 
J 9.5 12 3 2.0 0.0044 
K " - 4 5.0 0.0110 
L, 13 13 5 10.0 0.0220 
M 13 9 
N 13 14 Total 17.5 0.0385 


We know that p = 1 — wos = AT 5) = (0.6989 and in terms of éyx 
14(196 — 1) ; 


p = 0.7374 — 0.0385 = 0.6989 


The value given by DuBois for his method is 0.7511. 


Conclusion. A method has been developed for the treatment of rank correla- 
tion whe:e there are groups of equal variates. The method consists of applying 
a generally small correction to the value as ordinarily calculated by the mid- 
rank method in order to find the value which would be obtained by averaging 
the values of the rank correlation coefficient for all possible ways of arbitrarily 
assigning ranks to the equal variates. Thanks are due Professor P. 8S. Dwyer, 
without whose aid and encouragement this paper would not have been written. 
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NOTE ON THEORETICAL AND OBSERVED DISTRIBUTIONS OF 
REPETITIVE OCCURRENCES 


By P. S. OLMstTEaD 


1. A simple problem of repetitive occurrences. Two questions which the 
engineer often desires to answer whenever he has a new type of apparatus or a 
new design of an old type of apparatus are: How many times will it perform 
its intended function without failure? and How many times will it fail to perform 
its intended function in a given length of time? To do this, he selects a number 
of what he believes to be identical units of the apparatus and gives.each unit a 
performance test under a uniform test procedure. The number of satisfactory 
operations prior to the first observed failure to perform this operation is called 
a “run” and is a measure of the type desired for each unit. 

If it is assumed that the probability of failure at any operation is a constant, q, 
and the probability of satisfactory operation is 1 — gq or p, then the mathe- 


matical probability of runs of 0, 1, 2, 3--- satisfactory operations for any 
unit are 

(1) q; PY, PY, PY, °° 

respectively. 


Let x denote the number of satisfactory operations in any-run. The mean 
value of z, say m;, is given by 


(2) Ms = cr. 
q 

The variance of z is 

(3) of = 2. 


The first step in practice is to determine whether there exists a constant 
probability, p, by means of the application of the operation of statistical con- 
trol.’ Expressions (1), (2), and (3) provide the necessary information for doing 
this. When a constant probability exists as evidenced by at least 25 consecu- 
tive samples of 4 units each the following practical procedure has been found 
to be satisfactory. 

1. An estimate of p (or g), the sole parameter of the distribution, can be 
obtained from the average length of run in the sample. If p is less than 0.6 
and if the sample size is large, a reasonably good estimate of p can be obtained 
from the proportion of the sample having runs of zero length. 

2. The probability of getting runs of length z or more is p*. Thus, if a 
minimum (or maximum) value of the probability, p*, is chosen, a maximum 


1W. A. Shewhart, ‘Statistical Method from the Viewpoint of Quality Control,” The De- 
partment of Agriculture Graduate School, Washington, 1939, Chapter I. 
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(or minimum) expected length of run can be computed for use as a criterion 
for looking for assignable causes of variation in the length of individual runs 
by using the estimated value of p. 

3. The average and standard deviation to be used in calculating the limits 
to be applied to successive samples of rational sub-groups in accordance with 
the Shewhart” Criterion I are given by Equations (2) and (3) in which the 
estimates of p and q are substituted. 





2. Application to a signal transmission problem. The theoretical solution 
given above is a direct answer to the first question at the head of this note. 





TABLE I 


Observed distributions of runs of x occurrences of event E for various test periods of 
apparatus life 








No. of 
Occurrences| Freq. | - mamas 
per Period | 1 1 | 2] 3 |} 44] 8 


Test Period 


} 6 | 7 





x 














ONO Rr WN KF © 






Sample 
Size m | 958 | 1781/1222 |1005 | 796 | 630 | 543 | 431 | 301 157 



























The second question is also of interest particularly when failure to perform an 
operation does not impair the apparatus unit for performance of additional 
operations. In cases of this type, the engineer often lets his test continue for 
test periods of particular lengths, measured in numbers of operations or some- 
times in intervals of time (i.e., time intervals are often considered to be propor- 
tional to numbers of operations) and observes the number of failures during the 
test period for each unit. Thus, he may, after he has assured himself that 
control exists, arrange his data for each test period to show the frequency of 
occurrence of 0, 1, 2, 3, --- failures per unit. 

Data of this type which are typical of those found in other studies made 











2Loc. cit. 
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during the past two years are presented in Table I. These were obtained in a 
signal transmission study in which the data for successive periods were obtained 


TABLE II 


Comparison of observed and theoretical values of averages and variances for 
distributions of Table I 


Statisth Test Period 
ic or 





| 
| 3|4{s5s|e6f]7 |e | 
SMe | adiaonintia 

| | 


observed |. 916) 853) .786| .719| .679| .646| 632 .617) .532| .491 
observed | .098| .171) .269| .381| .448| .543) .537| .633, .917|1.026 


| | 
| 


theoretical* | .091) .172) .272| .390| .471) . 583) .620| .881|1.039 





observed | .091| .200| .343] .497| .556| .8: 760 1.075)1.783)1.924 
| 
theoretical* | .098| .202| .345| .542| .693) .848| .924|1 .005|1.658)2.117 




















* Based on assumption that 7 is the true value of q. 


TABLE III 
Theoretical distributions corresponding to distributions of Table I calculated by 


using 9 = = as the true value of q 


| Test Period 
Occurrences | 2 | ae ee 
per Period 





| 
| 


| | 
| 1519.0] 961.0] 723.0) 541.0) 407.0) 343.0 160.0| 77.0 
233.5] 205.3) 202.8) 173.3) 144.1) 126.4) .9| 39.2 
32.9) 43.8] 56.9) 55.5) 51.0, 46.6 35.1] 20.0 
4.8 9.4, 16.0) 17.8] 18.0) 17.1 .5| 10.2 
7) 2.0) 4.5] 5. 6.4, 6.3 

A Al 18} 1.8) 2.8) 2. 

jl) 8 
| 3} 
J 


aOnNoark wd — © 


9 or over 


Sample | | | | | 
| 1222! | 796, 630) 543 


Size n* | 958 | 1781 











* The observed values of no and n form the basis for the calculated distributions. 


for separate units. Since each set of these data passed the scrutiny for control, 
there is justification for assuming that a statistical universe exists and that its 
functional form may be derived from the observed distribution. It was found 
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that these data were consistent with the assumption that, where the probability 
of non-occurrence of a failure on a unit in the test period was q, the probability 
of exactly x failures on a unit was p’g. This set of mathematical probabilities 
is shown in (1) with q redefined to apply in this case to non-occurrence of a 
failure. 

Observed and “Theoretical” values of the averages and variances for the 
observed distributions are shown in Table II. The basis for calculating the 
theoretical values was to take the ratio (designated @) of m to n for each distri- 
bution as the estimate of the true value, g. Distributions as shown in Table III 


TABLE IV 
Test of fit of theoretical to observed distributions (7 able Il and Table I, , respectively) 


Test Period 


[ 
| pneeeenep REE i a — 
_ op 2 3 |. 4 | Bt Ps a 








8 | 





— 
2.24 | 0.20 0.32 | 2.09 9. a 1.07) 3.98 
| | | | 
| | | | | 
1 | | 2 3 |3 |3 |3 3 }4 |4 
P,2 90 | ‘87 | .55| .02) 87, .36| .10) .90 


* Minimum senten | in oil rm theoretical datetientinn tela as 5. 


sil ees of 
Freedom 


were calculated from each g. These distributions were tested against the ob 


served distributions by means of the x’ test with the results shown in Table IV, 
which are all within reasonable limits of what might be expected when a con- 
stant probability exists. 


3. Conclusions. When a constant probability applies to each operation in a 
repetitive process this note shows how to establish criteria for identifying signifi- 
cantly long or short lengths for individual runs and significantly high or low 
average lengths for groups of several runs. A problem taken from the field of 
signal transmission gives assurance of the existence of this type of distribution 
in practice. 
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